Page MenuHomePhabricator

AMDGPU/SILoadStoreOptimizer: Optimize scanning for mergeable instructions
Needs ReviewPublic

Authored by tstellar on Aug 8 2019, 10:39 AM.



This adds a pre-pass to this optimization that scans through the basic
block and generates lists of mergeable instructions with one list per unique

In the optimization phase instead of scanning through the basic block for mergeable
instructions, we now iterate over the lists generated by the pre-pass.

The decision to re-optimize a block is now made per list, so if we fail to merge any
instructions with the same address, then we do not attempt to optimize them in
future passes over the block. This will help to reduce the time this pass
spends re-optimizing instructions.

In one pathological test case, this change reduces the time spent in the
SILoadStoreOptimizer from 0.2s to 0.03s.

This restructuring will also make it possible to implement further solutions in
this pass, because we can now add less expensive checks to the pre-pass and
filter instructions out early which will avoid the need to do the expensive
scanning during the optimization pass. For example, checking for adjacent
offsets is an inexpensive test we can move to the pre-pass.

Event Timeline

tstellar created this revision.Aug 8 2019, 10:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2019, 10:39 AM
arsenm added inline comments.Thu, Sep 5, 11:49 AM

Typo s/on/no


Why std::list, and a std::list of lists?

tstellar marked an inline comment as done.Fri, Sep 13, 5:38 PM
tstellar added inline comments.

The main reason to use lists is so I can remove items without invalidating iterators.