Page MenuHomePhabricator

AMDGPU: Stop wasting argument registers with v3i32/v3f32
ClosedPublic

Authored by arsenm on Jul 9 2018, 1:45 AM.

Details

Summary

Since v4i32/v4f32 are legal, SelectionDAGBuilder promotes these
to v4i32/v4f32 arguments which consume an additional register.
In addition to wasting argument space, this produces extra
instructions since now it appears the 4th vector component has
a meaningful value to most combines.

Diff Detail

Event Timeline

arsenm created this revision.Jul 9 2018, 1:45 AM

Typo in summary v4i32/v4f32 -> v3i32,/v3f32.

arsenm added a comment.Jul 9 2018, 8:09 AM

Typo in summary v4i32/v4f32 -> v3i32,/v3f32.

I think it's right but I can reword it to be less ambiguous sounding

Can you please add tests for <3 x i64> and <3 x double>?

mareko added inline comments.Jul 9 2018, 6:33 PM
test/CodeGen/AMDGPU/ret.ll
165 ↗(On Diff #154548)

This can't be changed. <3 x i32> is a valid function argument type meaning 3 input VGPRs. There is no wasted space. It declares exactly 3 VGPRs. The VGPR indices are hardcoded in the hardware and can't be adjusted.

This patch along with the prerequisite patch doesn't break Mesa.

This patch along with the prerequisite patch doesn't break Mesa.

What's the prerequisite patch?

This revision is now accepted and ready to land.Jul 13 2018, 9:46 AM

This patch along with the prerequisite patch doesn't break Mesa.

What's the prerequisite patch?

D49128

I can confirm Mesa now works with this patch, however, it still breaks 9 lit tests for me:

LLVM :: CodeGen/AMDGPU/fceil.ll
LLVM :: CodeGen/AMDGPU/fmaxnum.ll
LLVM :: CodeGen/AMDGPU/fpext.ll
LLVM :: CodeGen/AMDGPU/hsa-metadata-from-llvm-ir-full.ll
LLVM :: CodeGen/AMDGPU/insert_vector_elt.ll
LLVM :: CodeGen/AMDGPU/kernel-args.ll
LLVM :: CodeGen/AMDGPU/max.ll
LLVM :: CodeGen/AMDGPU/store-global.ll
LLVM :: CodeGen/AMDGPU/store-private.ll
arsenm closed this revision.Jul 28 2018, 7:12 AM

r338197