This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Mark scalar loads as rematerializable
ClosedPublic

Authored by arsenm on Jul 29 2015, 6:59 PM.

Details

Reviewers
tstellar
tstellarAMD
rampitec
piotr
Group Reviewers
Restricted Project
Summary

This should be true, but this is useless as is. The rematerialization
logic only permits rematerialize with constant physical register uses,
so non-constant physregs or virtual register uses (the case that
really matters) are not rematerialized. Add the tests which shows
nothing happens, but should in the future.

Also, all loads should really be rematerializable so in the future
this should apply to all the other kinds.

Diff Detail

Event Timeline

tstellarAMD retitled this revision from to AMDGPU/SI: Mark SMRD instructions as rematerializable.
tstellarAMD updated this object.
tstellarAMD added a reviewer: arsenm.
tstellarAMD added a subscriber: llvm-commits.
arsenm edited edge metadata.Jul 29 2015, 8:12 PM

I don't this is generally OK. It will probably work now since we only use SMRD currently for constant memory, but if it were a volatile load it can't be rematerialized. The generic version checks the AA if the load is invariant.

Also, is rematerializing SMRD instructions to save SGPR spills really a win? 64 SGPRs can be spilled to a single VGPR, but the rematerialized SMRD instruction is only cheap if the loaded value is still in cache, right?

arsenm added inline comments.Jun 21 2016, 1:36 PM
lib/Target/AMDGPU/SIInstrInfo.cpp
79–81 ↗(On Diff #30980)

We probably don't need this. It looks like if isReallyTriviallyReMaterializable fails, it will still check isReallyTriviallyReMaterializableGeneric which checks the AA for you

mareko added a subscriber: mareko.Jun 28 2016, 7:24 AM

Also, is rematerializing SMRD instructions to save SGPR spills really a win? 64 SGPRs can be spilled to a single VGPR, but the rematerialized SMRD instruction is only cheap if the loaded value is still in cache, right?

Yes, it is a win. The SMEM cache is shared by 4 CUs, which means only one out of at least 16 and at most 160 waves will get a cache miss in most cases. That leaves us with cache hits, which take 16-20 clocks (I don't remember exactly), which is the approximate cost of 4 v_read/writelane instructions. By that logic, spilling and restoring is twice as expensive as repeating the load.

arsenm accepted this revision.Aug 3 2017, 4:55 PM

A test would be nice. This should assert that it isn't a store though in case we see a scalar store

This revision is now accepted and ready to land.Aug 3 2017, 4:55 PM
arsenm commandeered this revision.Nov 29 2022, 4:23 PM
arsenm edited reviewers, added: tstellarAMD; removed: arsenm.
This revision now requires review to proceed.Nov 29 2022, 4:23 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 29 2022, 4:23 PM
arsenm added a reviewer: Restricted Project.Nov 29 2022, 4:23 PM

I think we need https://reviews.llvm.org/D106408 to make this really work, and avoid the special case in isReallyTriviallyReMaterializable

I do not think this is OK. We may issue a scalar load before a first potentially aliasing store. After the store it must become a VMEM load because caches are not coherent. That said you cannot reorder SMRD with aliasing VMEM store, both because of the coherency and because memory may be indeed overwritten.

I do not think this is OK. We may issue a scalar load before a first potentially aliasing store. After the store it must become a VMEM load because caches are not coherent. That said you cannot reorder SMRD with aliasing VMEM store, both because of the coherency and because memory may be indeed overwritten.

This only happens for invariant MMOs. This isn't the only condition

I do not think this is OK. We may issue a scalar load before a first potentially aliasing store. After the store it must become a VMEM load because caches are not coherent. That said you cannot reorder SMRD with aliasing VMEM store, both because of the coherency and because memory may be indeed overwritten.

This only happens for invariant MMOs. This isn't the only condition

Where is that checked? Also needs tests, both positive and negative.

I do not think this is OK. We may issue a scalar load before a first potentially aliasing store. After the store it must become a VMEM load because caches are not coherent. That said you cannot reorder SMRD with aliasing VMEM store, both because of the coherency and because memory may be indeed overwritten.

This only happens for invariant MMOs. This isn't the only condition

Where is that checked? Also needs tests, both positive and negative.

https://github.com/llvm/llvm-project/blob/442c13f9ff8a8c6739d649c5a68e065f2a727480/llvm/lib/CodeGen/TargetInstrInfo.cpp#L956

The patch as-is doesn't apply obviously. I wrote tests and found they don't work due to virtual register uses in any real case (which is I'm guessing what the special case in isReallyTriviallyReMaterializable is about)

I do not think this is OK. We may issue a scalar load before a first potentially aliasing store. After the store it must become a VMEM load because caches are not coherent. That said you cannot reorder SMRD with aliasing VMEM store, both because of the coherency and because memory may be indeed overwritten.

This only happens for invariant MMOs. This isn't the only condition

Where is that checked? Also needs tests, both positive and negative.

https://github.com/llvm/llvm-project/blob/442c13f9ff8a8c6739d649c5a68e065f2a727480/llvm/lib/CodeGen/TargetInstrInfo.cpp#L956

The patch as-is doesn't apply obviously. I wrote tests and found they don't work due to virtual register uses in any real case (which is I'm guessing what the special case in isReallyTriviallyReMaterializable is about)

I thought we are overriding TargetInstrInfo::isReallyTriviallyReMaterializable completely.

I thought we are overriding TargetInstrInfo::isReallyTriviallyReMaterializable completely.

That doesn't bypass the virtual register use checks.

arsenm updated this revision to Diff 484800.Dec 22 2022, 4:50 AM
arsenm added a reviewer: rampitec.

Rebase and add tests, but this doesn't do anything. Either need to bring back the hack or fix virtual register handling

Herald added a project: Restricted Project. · View Herald TranscriptDec 22 2022, 4:50 AM
arsenm retitled this revision from AMDGPU/SI: Mark SMRD instructions as rematerializable to AMDGPU: Mark scalar loads as rematerializable.Dec 22 2022, 4:52 AM
arsenm edited the summary of this revision. (Show Details)
foad added a comment.Jun 5 2023, 11:58 PM

Is this NFC?

arsenm added a subscriber: piotr.Jun 6 2023, 5:48 AM

Is this NFC?

Apparently not, @piotr reported some big improvements. So I suppose this could use more tests where it demonstrates real improvements

arsenm added a comment.Jun 6 2023, 5:49 AM

Is this NFC?

Apparently not, @piotr reported some big improvements. So I suppose this could use more tests where it demonstrates real improvements

Apparently that was with a patch of the patch, so this is probably still NFC until something else happens

piotr accepted this revision as: piotr.Jun 6 2023, 6:22 AM

Yes - this is NFC, but paves the way for other changes.

I experimented with adding isSMRD() check in AMDGPU hook SIInstrInfo::isReallyTriviallyReMaterializable on top of D11621, and that resulted in some useful rematerialization.

However, it also resulted in some redundant registers being used because there is no mechanism to shrink the load to only load the part needed for current rematerialization context. So that needs to be added before we extend SIInstrInfo::isReallyTriviallyReMaterializable I think.

This revision is now accepted and ready to land.Jun 6 2023, 6:22 AM