This is an archive of the discontinued LLVM Phabricator instance.

[Spill2Reg][1/9] Initial commit. This is boilerplate code.
AcceptedPublic

Authored by vporpo on Jan 26 2022, 5:32 PM.

Details

Summary

This is the first commit for the Spill2Reg optimization pass.
The goal of this pass is to selectively replace spills to the stack with
spills to vector registers. This can help remove back-end stalls in x86.

RFC:
https://lists.llvm.org/pipermail/llvm-dev/2022-January/154782.html
https://discourse.llvm.org/t/rfc-spill2reg-selectively-replace-spills-to-stack-with-spills-to-vector-registers/59630

Diff Detail

Event Timeline

vporpo created this revision.Jan 26 2022, 5:32 PM
vporpo requested review of this revision.Jan 26 2022, 5:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 26 2022, 5:32 PM
wxiao3 added a subscriber: wxiao3.Jan 26 2022, 7:06 PM
vporpo edited the summary of this revision. (Show Details)Jan 26 2022, 7:31 PM
lkail added a subscriber: lkail.Jan 26 2022, 8:07 PM
Matt added a subscriber: Matt.Jan 27 2022, 4:00 AM
mdchen added a subscriber: mdchen.Jan 27 2022, 5:51 PM
vporpo updated this revision to Diff 403892.Jan 27 2022, 10:38 PM

Disable the pass by default.

vporpo edited the summary of this revision. (Show Details)Feb 3 2022, 3:17 PM
arsenm added a subscriber: arsenm.Feb 3 2022, 3:22 PM
arsenm added inline comments.
llvm/lib/CodeGen/Spill2Reg.cpp
26–28

Probably should move this to TargetPassConfig

vporpo updated this revision to Diff 405826.Feb 3 2022, 4:14 PM
vporpo edited the summary of this revision. (Show Details)

Moved spill2reg flag to TargetPassConfig.

llvm/lib/CodeGen/Spill2Reg.cpp
26–28

Done.

RKSimon added a subscriber: RKSimon.Feb 4 2022, 1:30 AM

Please can you rename the Spill2Reg patch sequence using [X/N] so we can more easily track dependencies

vporpo retitled this revision from [Spill2Reg] Initial commit. This is boilerplate code. to [Spill2Reg][1/9] Initial commit. This is boilerplate code..Feb 4 2022, 9:48 AM
vporpo added a reviewer: RKSimon.

Very good idea!
and go further, we may create concept of "cheaper spills" for scalar regs in RA (by check the interference of vector regs with it) to recalculate the spill energy in spillplacer,
and if we can load/store more scalar regs from/to one vector, it may be profitable to spill a vector reg (even the interference exist between scalar regs and vector regs) instead of spill more scalar regs.

Yes, a two-tier spilling scheme might make sense for some targets: first spill to consecutive lanes in the vector, and then spill the vectors to memory. I think though that in x86 it may be a lot trickier to check when this will perform better than standard spills to stack.

arsenm accepted this revision.Feb 10 2022, 8:32 AM

LGTM, but wait for more of the rest of the sequence to submit

llvm/lib/CodeGen/Spill2Reg.cpp
53

Pointless overload and ;?

This revision is now accepted and ready to land.Feb 10 2022, 8:32 AM
vporpo updated this revision to Diff 407581.Feb 10 2022, 9:39 AM

Removed redundant overload.

Herald added a project: Restricted Project. · View Herald TranscriptJun 16 2022, 12:02 PM
yiminli added a subscriber: yiminli.EditedOct 15 2022, 10:13 PM

Should this patch set optimize out spill/reload in function "test" of

?

I tried:

llvm-project/build/bin/clang -S -emit-llvm test.cpp -march=skylake-avx512 -O2
llvm-project/build/bin/llc \
	-enable-spill2reg -simplify-mir -spill2reg-mem-instrs=0 -spill2reg-vec-instrs=99999 \
	-march x86 -mattr=+avx2 -filetype=asm --x86-asm-syntax=intel test.ll

and got:

_Z4testPfS_S_:                          # @_Z4testPfS_S_
        .cfi_startproc
# %bb.0:                                # %entry
        sub     esp, 140
        .cfi_def_cfa_offset 144
        mov     eax, dword ptr [esp + 152]
        mov     ecx, dword ptr [esp + 144]
        vmovaps zmm0, zmmword ptr [ecx]
        vmovups zmmword ptr [esp + 64], zmm0    # 64-byte Spill
        mov     ecx, dword ptr [esp + 148]
        vmovaps zmm1, zmmword ptr [ecx]
        vaddps  zmm0, zmm0, zmm1
        vaddps  zmm1, zmm1, zmmword ptr [eax]
        vmovups zmmword ptr [esp], zmm1         # 64-byte Spill
        call    _Z12print_m512_fDv16_f
        vmovups zmm0, zmmword ptr [esp]         # 64-byte Reload
        call    _Z12print_m512_fDv16_f
        vmovups zmm0, zmmword ptr [esp + 64]    # 64-byte Reload
        add     esp, 140
        .cfi_def_cfa_offset 4
        ret

Did I miss anything? Thanks!

In the test you provided vector registers are spilled across the call . Spill2reg will not try to work with vector spills/reloads. The reasoning is that if there is a free register for spill2reg to use, then the register allocator would have already found it and used it to avoid the spill. It would also make little sense performance wise because saving/restoring a vector register to another vector register is simply redundant: you could have just used the destination register in the first place.