This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: Use destination register bank in applyMappingLoad
ClosedPublic

Authored by Petar.Avramovic on May 6 2021, 6:17 AM.

Details

Summary

Large loads on target that does not useFlatForGlobal have to be split
in regbankselect. This did not happen in case when destination had vgpr
bank and address had sgpr bank.
Instead of checking if address bank is sgpr check bank of the destination.

Diff Detail

Event Timeline

Petar.Avramovic created this revision.May 6 2021, 6:17 AM
Petar.Avramovic requested review of this revision.May 6 2021, 6:17 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2021, 6:17 AM

This was detected on gfx7 on large test (few hundred lines) with many uniform loads.
Loads in first ~200 lines had amdgpu.noclobber but after some threshold they did not.
AnnotateUniformValues was relying on MemoryDependenceResults::getSimplePointerDependencyFrom which gives up at some point and returns MemDepResult::getUnknown() resulting in amdgpu.noclobber not being set on address. Such load will be selected using vgpr destination (and sgpr address on gfx7).
In the case of mentioned test, amdgpu.noclobber not being set is fixed by D101962. Thus only mir test.

arsenm added a comment.May 6 2021, 3:33 PM

Can you also add an end to end IR test for this case

llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll
449–453

This looks like an accidental bug fix

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-uniform-load-noclobber.mir
4–14

Don't need the IR section, just mark the MMOs as invariant

foad added inline comments.May 7 2021, 3:15 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll
449–453

Why is it a bug fix? Isn't it better to use a scalar load when possible, cos it's less memory traffic?

Petar.Avramovic added inline comments.May 7 2021, 3:20 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll
449–453

It is a silent bug, that load was marked as vgpr dest, and then split into two sgpr dest loads (was meant to be one buffer load according to banks assigned before applyMapping)

Test split for precommit, found another test for end to end run.

Can you also add an end to end IR test for this case

No longer possible because of D101962 (also was way too large for lit test).
I did find another test, it needs -mattr=+unaligned-access-mode and align 1 to pass through legalizer and get vgpr dst + sgpr address.

arsenm added inline comments.May 7 2021, 10:14 AM
llvm/test/CodeGen/AMDGPU/GlobalISel/load-constant.96.ll
449–453

Scalar loads don't support unaligned access, so this was wrong to be using them

arsenm accepted this revision.May 7 2021, 10:16 AM

LGTM, although I'm a bit concerned we were selecting unaligned scalar loads so there may be another bug hidden in here

This revision is now accepted and ready to land.May 7 2021, 10:16 AM