I never completed the work on the patches referenced by
f8bf7d7f42f28fa18144091022236208e199f331, but this was intended to
avoid folding immediate writes into m0 which the coalescer doesn't
understand very well. Relax this to allow simple SGPR immediates to
fold directly into VGPR copies. This pattern shows up routinely in
current GlobalISel code since nothing is smart enough to emit VGPR
constants yet.
Details
Diff Detail
Event Timeline
I do not think this can work with physregs given the CFG.
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
654 | Source must be strictly scalar. You can have a copy of vector to vector (say a to v) and it can be initialized with a different mask. In fact, how can we be sure even an SGPR was not initialized in a different CFG path? This shall never happen to a virtual register in SSA, but can easily happen to a physreg. | |
658 | This does fit into a single line. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
654 | This is folding a constant. The mask doesn't matter. This is also not analyzing physreg defs, it's searching the uses of an SSA virtual register. |
llvm/lib/Target/AMDGPU/SIFoldOperands.cpp | ||
---|---|---|
654 |
Ah, right! It is destination physical, not source. |
It can probably create a worse code sometimes. Assume you have a constant which cannot be directly assigned to a register (v_accvgpr_write_b32 only accepts inline literals). If source def will not become dead it will create an extra copy. You probably need some legality checks here.
But that's really orthogonal to what you are doing and is a separate change. It equally related to virtual registers.
LGTM, just fix formatting at line 657.
Source must be strictly scalar. You can have a copy of vector to vector (say a to v) and it can be initialized with a different mask.
In fact, how can we be sure even an SGPR was not initialized in a different CFG path? This shall never happen to a virtual register in SSA, but can easily happen to a physreg.