This is work in progress, please provide feedback. It supersedes http://reviews.llvm.org/D13586 based on comments there.
At a high-level, the motivation of these changes is:
- Add llvm.amdgcn.buffer.load.format intrinsic to expose (almost) the full range of what the hardware can do (minus addr64 mode and D16 variants, both of which should arguably get their own intrinsics).
- For both image loads/samples and buffer load, split the (simple) optimization of determining the appropriate size/dmask of the load into an IR-level CodeGenPrepare pass, while the selection of the appropriate machine instruction stays in SIISelLowering/patterns in TableGen files, respectively.
Known issues / questions:
- Some regressions in image/sample-related tests. Also, buffer.load.format tests obviously need to be expanded as noted.
- It is annoying to match the MIMG-related intrinsics by name at the IR level, but this seems to be necessary as long as those intrinsics are defined in the target .td files, and so no IntrinsicID is assigned.
- The big one: the v3f32 variant (BUFFER_LOAD_FORMAT_XYZ) is currently not supported because v3f32 is not a MachineValueType, and the type legalization step of codegen bails out. IMO, the clean solution would be to argue that since amgcn is a real existing target that genuinely has 3-element vector instructions for non-crazy reasons, v3f32 (and v3i32) should be added to MachineValueType. But of course, this is a pretty core change and the image path in SIISelLowering did not go that route and hacked around it instead.
I have not actually looked into an alternative design without modifying MVT. Since at the IR-level the desired size must be represented by the return type of the intrinsic, any hack-around would somehow involved convincing the type legalization step to accept v3f32 (unlike in the image case, where the IR-level always uses v4f32 and implicitly stores the size in the dmask).
Perhaps there is a way of telling the target-independent codegen to accept v3f32s without actually adding it to MVT?
So... going the route of adding v3f32 and v3i32 to MVT is the best path that I can see, but before I do that, I want to get feedback on the plan.
- Eventually, Mesa and other clients are intended to emit the buffer.load.format intrinsic directly. In the meantime, SI.load.input should be transformed early on. Where is the best place to do that from a design POV? The SILoadShrink pass is an obvious candidate, but it clashes with the name given to that pass.
A small nit: We define such intrinsics in SIIntrinsics.td