Since v4i32/v4f32 are legal, SelectionDAGBuilder promotes these
to v4i32/v4f32 arguments which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.
Details
Details
Diff Detail
Diff Detail
Event Timeline
test/CodeGen/AMDGPU/ret.ll | ||
---|---|---|
165 ↗ | (On Diff #154548) | This can't be changed. <3 x i32> is a valid function argument type meaning 3 input VGPRs. There is no wasted space. It declares exactly 3 VGPRs. The VGPR indices are hardcoded in the hardware and can't be adjusted. |
Comment Actions
I can confirm Mesa now works with this patch, however, it still breaks 9 lit tests for me:
LLVM :: CodeGen/AMDGPU/fceil.ll LLVM :: CodeGen/AMDGPU/fmaxnum.ll LLVM :: CodeGen/AMDGPU/fpext.ll LLVM :: CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll LLVM :: CodeGen/AMDGPU/insert_vector_elt.ll LLVM :: CodeGen/AMDGPU/kernel-args.ll LLVM :: CodeGen/AMDGPU/max.ll LLVM :: CodeGen/AMDGPU/store-global.ll LLVM :: CodeGen/AMDGPU/store-private.ll