This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Make SIInsertWaits about a factor of 4 faster
ClosedPublic

Authored by arsenm on Sep 25 2015, 5:52 PM.

Details

Reviewers
tstellarAMD
Summary

This was the slowest target custom pass and was spending 80%
of the time in getMinimalPhysRegClass which was called
for every register operand.

Try to use the statically known register class when possible from
the instruction's MCOperandInfo. There are a few pseudo instructions
which are not well behaved with unknown register classes which still
require the expensive physical register class search.

There are a few other possibilities for making this even faster,
such as not inspecting implicit operands. For now those are checked
because it is technically possible to have a scalar load into
exec or vcc which can be implicitly used.

Diff Detail

Event Timeline

arsenm updated this revision to Diff 35787.Sep 25 2015, 5:52 PM
arsenm retitled this revision from to AMDGPU: Make SIInsertWaits about a factor of 4 faster.
arsenm updated this object.
arsenm added a reviewer: tstellarAMD.
arsenm added a subscriber: llvm-commits.
tstellarAMD accepted this revision.Oct 1 2015, 1:40 PM
tstellarAMD edited edge metadata.

LGTM.

This revision is now accepted and ready to land.Oct 1 2015, 1:40 PM
arsenm closed this revision.Oct 1 2015, 2:45 PM

r249079