This is the first commit for the Spill2Reg optimization pass.
The goal of this pass is to selectively replace spills to the stack with
spills to vector registers. This can help remove back-end stalls in x86.
Very good idea!
and go further, we may create concept of "cheaper spills" for scalar regs in RA (by check the interference of vector regs with it) to recalculate the spill energy in spillplacer,
and if we can load/store more scalar regs from/to one vector, it may be profitable to spill a vector reg (even the interference exist between scalar regs and vector regs) instead of spill more scalar regs.
Yes, a two-tier spilling scheme might make sense for some targets: first spill to consecutive lanes in the vector, and then spill the vectors to memory. I think though that in x86 it may be a lot trickier to check when this will perform better than standard spills to stack.
Should this patch set optimize out spill/reload in function "test" of?
llvm-project/build/bin/clang -S -emit-llvm test.cpp -march=skylake-avx512 -O2 llvm-project/build/bin/llc \ -enable-spill2reg -simplify-mir -spill2reg-mem-instrs=0 -spill2reg-vec-instrs=99999 \ -march x86 -mattr=+avx2 -filetype=asm --x86-asm-syntax=intel test.ll
_Z4testPfS_S_: # @_Z4testPfS_S_ .cfi_startproc # %bb.0: # %entry sub esp, 140 .cfi_def_cfa_offset 144 mov eax, dword ptr [esp + 152] mov ecx, dword ptr [esp + 144] vmovaps zmm0, zmmword ptr [ecx] vmovups zmmword ptr [esp + 64], zmm0 # 64-byte Spill mov ecx, dword ptr [esp + 148] vmovaps zmm1, zmmword ptr [ecx] vaddps zmm0, zmm0, zmm1 vaddps zmm1, zmm1, zmmword ptr [eax] vmovups zmmword ptr [esp], zmm1 # 64-byte Spill call _Z12print_m512_fDv16_f vmovups zmm0, zmmword ptr [esp] # 64-byte Reload call _Z12print_m512_fDv16_f vmovups zmm0, zmmword ptr [esp + 64] # 64-byte Reload add esp, 140 .cfi_def_cfa_offset 4 ret
Did I miss anything? Thanks!
In the test you provided vector registers are spilled across the call . Spill2reg will not try to work with vector spills/reloads. The reasoning is that if there is a free register for spill2reg to use, then the register allocator would have already found it and used it to avoid the spill. It would also make little sense performance wise because saving/restoring a vector register to another vector register is simply redundant: you could have just used the destination register in the first place.