Page MenuHomePhabricator

AMDGPU/SILoadStoreOptimizer: Improve merging of out of order offsets
AcceptedPublic

Authored by tstellar on Aug 8 2019, 12:50 PM.

Details

Summary

This improves merging of sequences like:

store a, ptr + 4
store b, ptr + 8
store c, ptr + 12
store d, ptr + 16
store e, ptr + 20
store f, ptr

Prior to this patch the basic block was scanned in order to find instructions
to merge and the above sequence would be transformed to:

store4 <a, b, c, d>, ptr + 4
store e, ptr + 20
store r, ptr

With this change, we now sort all the candidate merge instructions by their offset,
so instructions are visited in offset order rather than in the order they appear
in the basic block. We now transform this sequnce into:

store4 <f, a, b, c>, ptr
store2 <d, e>, ptr + 16

Another benefit of this change is that since we have sorted the mergeable lists
by offset, we can easily check if an instruction is mergeable by checking the
offset of the instruction that becomes before or after it in the sorted list.
Once we determine an instruction is not mergeable we can remove it from the list
and avoid having to do the more expensive mergeablilty checks.

Diff Detail

Event Timeline

tstellar created this revision.Aug 8 2019, 12:50 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2019, 12:50 PM
arsenm added inline comments.Aug 12 2019, 1:09 PM
llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
228

Why std::list?

278–279

Weird format

1936–1940

Don't you know which is first from which was encountered first?

I think the code would benefit from the refactoring I've mentioned on the other patch, where the lists only hold a structure with information on a single instruction. Maybe call it CandidateInfo (information of one instruction, persistent in lists) vs. CombineInfo (information on a pair, only temporary).

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
567–572

Can that really happen? Only instructions with the same InstClass should be added to the same list.

1936

Wasn't there some talk about ordered basic blocks? They exist for IR apparently, but not for MIR unless we're tracking live ranges, which we don't do here, so...

This pass could perhaps number the CombineInfo instructions in order as they're collected at the start? It'd have to be kept uptodate as instructions are merged.

tstellar updated this revision to Diff 232684.Dec 6 2019, 6:47 PM

Rebase on master and update for a recent refactoring.

Build result: FAILURE - Could not check out parent git hash "8d7b8123bba35d4092d66750a83f9d0980bd169c". It was not found in the repository. Did you configure the "Parent Revision" in Phabricator properly? Trying to apply the patch to the master branch instead...

ERROR: arc patch failed with error code 1. Check build log for details.
Log files: console-log.txt, CMakeCache.txt

tstellar marked 4 inline comments as done.Dec 6 2019, 6:56 PM

I think the code would benefit from the refactoring I've mentioned on the other patch, where the lists only hold a structure with information on a single instruction. Maybe call it CandidateInfo (information of one instruction, persistent in lists) vs. CombineInfo (information on a pair, only temporary).

See D71045 .

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
1936

I added an Order field to keep track of the ordering.

nhaehnle accepted this revision.Dec 8 2019, 10:12 AM

Thanks. This basically looks good to me, some minor nitpicks.

llvm/lib/Target/AMDGPU/SILoadStoreOptimizer.cpp
786

Should be "Handle DS instructions."

849–850

This method should probably be renamed now since its meaning has shifted by quite a lot. Not sure what name is best, maybe checkAndPrepareMerge? The "prepare" part refers to collecting the InstsToMove.

Come to think of it, maybe the InstsToMove could be moved out of the CombineInfo struct? Then the CombineInfo arguments here could almost be passed as const, except for the fixup by the final offsetsCanBeCombined.

2010–2011

Maybe move this check into optimizeInstsWithSameBaseAddr? That function is already responsible for handling the case where the size is 1...

This revision is now accepted and ready to land.Dec 8 2019, 10:12 AM
tstellar updated this revision to Diff 238383.Wed, Jan 15, 3:54 PM
tstellar marked 3 inline comments as done.

+ Remove InstsToMove from CombineInfo struct
+ Address other review comments.

tstellar marked an inline comment as done.Wed, Jan 15, 3:54 PM
arsenm accepted this revision.Wed, Jan 15, 4:24 PM

LGTM

Unit tests: pass. 61849 tests passed, 0 failed and 781 were skipped.

clang-tidy: unknown.

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml