Page MenuHomePhabricator

[AMDGPU] Enable FLAT LDS DMA on gfx9/10 before gfx940
ClosedPublic

Authored by rampitec on May 6 2022, 1:33 PM.

Details

Summary

We always had global and scratch loads to LDS in the gfx9,
but did not handle it. These were available via the 'lds'
encoding bit. In gfx940 this bit was reused as 'svs' which
resulted in new '_lds' opcodes effectively pushing this
bit into the opcode, but functionally it is the same. These
instructions are also available on gfx10.

Diff Detail

Event Timeline

rampitec created this revision.May 6 2022, 1:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 1:33 PM
rampitec requested review of this revision.May 6 2022, 1:33 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 6 2022, 1:33 PM
Herald added a subscriber: wdng. · View Herald Transcript

A potentially better alternative is to use gfx940 names with _LDS_ in the mnemonic instead of a modifier. This is logically a different opcode anyway. The only downside it is not compatible with the documentation and sp3. But then it was not implemented before and therefore not used, so there shall be no compatibility problem on practice. Well, it will also be different from MUBUF. Given the difference in both semantics and addressing mode I personally would prefer it to be different opcodes. At a pseudo level it is certainly easier to have separate ops for this.

Preferences?

One more thing to note: it is already incompatible with sp3 because we prohibit unused vdst, while sp3 enforces it.

arsenm added a comment.May 7 2022, 2:58 AM

A potentially better alternative is to use gfx940 names with _LDS_ in the mnemonic instead of a modifier. This is logically a different opcode anyway. The only downside it is not compatible with the documentation and sp3. But then it was not implemented before and therefore not used, so there shall be no compatibility problem on practice. Well, it will also be different from MUBUF. Given the difference in both semantics and addressing mode I personally would prefer it to be different opcodes. At a pseudo level it is certainly easier to have separate ops for this.

Preferences?

It's probably better to have separate opcodes. In general I think the way we try to force all of these subtarget changes onto the same generic pseudos is more trouble than it's worth. It requires more and more code to verify and make use of the features, and it would be cleaner to move towards separate instruction definitions per subtarget

A potentially better alternative is to use gfx940 names with _LDS_ in the mnemonic instead of a modifier. This is logically a different opcode anyway. The only downside it is not compatible with the documentation and sp3. But then it was not implemented before and therefore not used, so there shall be no compatibility problem on practice. Well, it will also be different from MUBUF. Given the difference in both semantics and addressing mode I personally would prefer it to be different opcodes. At a pseudo level it is certainly easier to have separate ops for this.

Preferences?

It's probably better to have separate opcodes. In general I think the way we try to force all of these subtarget changes onto the same generic pseudos is more trouble than it's worth. It requires more and more code to verify and make use of the features, and it would be cleaner to move towards separate instruction definitions per subtarget

I remember your idea about switchable instruction tables per subtarget, but this is not really that. This is more about the asm syntax compatibility: to make all gfx9/gfx10 the same for these instructions, or to follow the spec which was amended for gfx940. It was amended purely due to encoding considerations, but semantically it is still the same instructions and do exactly the same. So I believe using same pseudos it warranted here.

What I've heard on today's meeting we are leaning towards spec compatibility, so then the patch does exactly that.

arsenm accepted this revision.May 17 2022, 11:54 AM
This revision is now accepted and ready to land.May 17 2022, 11:54 AM
This revision was landed with ongoing or failed builds.May 17 2022, 12:16 PM
This revision was automatically updated to reflect the committed changes.