Page MenuHomePhabricator

AMDGPU/GlobalISel: RegBankSelect interp intrinsics
Needs ReviewPublic

Authored by arsenm on Jul 17 2019, 6:05 AM.

Details

Reviewers
nhaehnle
tstellar
Summary

Note this assumes the future use of immediate for immarg, not the current situation which G_CONSTANT

Diff Detail

Event Timeline

arsenm created this revision.Jul 17 2019, 6:05 AM
arsenm edited the summary of this revision. (Show Details)Jul 17 2019, 6:06 AM
nhaehnle added inline comments.Mon, Jul 22, 2:33 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332

We don't waterfall these in the non-GlobalISel path as far as I can tell? Should just use readfirstlane if necessary.

arsenm marked an inline comment as done.Mon, Jul 22, 3:43 AM
arsenm added inline comments.
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332

I think this is more of a defect in SelectionDAG because handling this correctly is hard. I’m considering adding a pseudo UniformVGPR register bank for cases where readfirstlane is OK

nhaehnle added inline comments.Mon, Jul 22, 4:12 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332

The fact that a readfirstlane is generated instead of a waterfall loop is not a defect in SelectionDAG, at least not in general.

Frontend languages have builtins whose behavior is undefined when certain arguments are dynamically non-uniform. That is, programmers can use them in a context where the compiler cannot possibly prove that the value will be uniform, but the programmer implicitly guarantees that it will be at runtime. Emitting a readfirstlane for those when necessary is the correct behavior.

arsenm marked an inline comment as done.Mon, Jul 22, 4:56 AM
arsenm added inline comments.
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332

What about the case where optimizations introduce divergence into the value?

nhaehnle added inline comments.Wed, Jul 24, 1:19 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332

Such transforms usually aren't optimizations, so mostly they shouldn't be done in the first place...

Okay, so admittedly there may be some cases where going the waterfall route is actually preferable. If this is really the case, then we probably need to do some more thinking about how to properly represent it, perhaps by having the base intrinsics (interp, load/store with resource descriptors) support divergence via waterfall loops, but adding some kind of "assume uniform" intrinsic whose semantics are that it passes through a value similarly to readfirstlane, except that the behavior is actually undefined if the input value is divergent.

Something like that seems like a reasonable long-term direction, but I'd say that goes beyond the scope of this change :)