This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU/GlobalISel: RegBankSelect interp intrinsics
ClosedPublic

Authored by arsenm on Jul 17 2019, 6:05 AM.

Details

Reviewers
nhaehnle
tstellar
Summary

Note this assumes the future use of immediate for immarg, not the current situation which G_CONSTANT

Diff Detail

Event Timeline

arsenm created this revision.Jul 17 2019, 6:05 AM
arsenm edited the summary of this revision. (Show Details)Jul 17 2019, 6:06 AM
nhaehnle added inline comments.Jul 22 2019, 2:33 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

We don't waterfall these in the non-GlobalISel path as far as I can tell? Should just use readfirstlane if necessary.

arsenm marked an inline comment as done.Jul 22 2019, 3:43 AM
arsenm added inline comments.
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

I think this is more of a defect in SelectionDAG because handling this correctly is hard. I’m considering adding a pseudo UniformVGPR register bank for cases where readfirstlane is OK

nhaehnle added inline comments.Jul 22 2019, 4:12 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

The fact that a readfirstlane is generated instead of a waterfall loop is not a defect in SelectionDAG, at least not in general.

Frontend languages have builtins whose behavior is undefined when certain arguments are dynamically non-uniform. That is, programmers can use them in a context where the compiler cannot possibly prove that the value will be uniform, but the programmer implicitly guarantees that it will be at runtime. Emitting a readfirstlane for those when necessary is the correct behavior.

arsenm marked an inline comment as done.Jul 22 2019, 4:56 AM
arsenm added inline comments.
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

What about the case where optimizations introduce divergence into the value?

nhaehnle added inline comments.Jul 24 2019, 1:19 AM
lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

Such transforms usually aren't optimizations, so mostly they shouldn't be done in the first place...

Okay, so admittedly there may be some cases where going the waterfall route is actually preferable. If this is really the case, then we probably need to do some more thinking about how to properly represent it, perhaps by having the base intrinsics (interp, load/store with resource descriptors) support divergence via waterfall loops, but adding some kind of "assume uniform" intrinsic whose semantics are that it passes through a value similarly to readfirstlane, except that the behavior is actually undefined if the input value is divergent.

Something like that seems like a reasonable long-term direction, but I'd say that goes beyond the scope of this change :)

arsenm updated this revision to Diff 235450.Dec 27 2019, 2:51 PM

Use readfirstlane

Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2019, 2:51 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
arsenm marked an inline comment as done.Jan 20 2020, 4:35 AM

ping

lib/Target/AMDGPU/AMDGPURegisterBankInfo.cpp
1332 ↗(On Diff #210301)

These could also assert uniformity by using readfirstlane themselves

This revision is now accepted and ready to land.Jan 22 2020, 4:46 AM