This is an archive of the discontinued LLVM Phabricator instance.

[x86, AVX] allow explicit calls to VZERO* to modify state in VZeroUpperInserter pass
ClosedPublic

Authored by spatel on May 23 2016, 10:16 AM.

Details

Summary

Although this fixes the duplicate VZ* instructions in the existing tests, we still have more problems.

For example, why does a VZU call cause this stack spill which then leads to yet another VZU?

define <4 x float> @avx_in_sse_out(<8 x float> %x) nounwind {
; CHECK-LABEL: avx_in_sse_out:
; CHECK:       # BB#0:
; CHECK-NEXT:    vmovups %ymm0, -{{[0-9]+}}(%rsp) # 32-byte Spill
; CHECK-NEXT:    vzeroupper
; CHECK-NEXT:    vmovups -{{[0-9]+}}(%rsp), %ymm0 # 32-byte Reload
; CHECK-NEXT:    vzeroupper
; CHECK-NEXT:    retq
;
  %xmm = shufflevector <8 x float> %x, <8 x float> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>
  call void @llvm.x86.avx.vzeroupper()
  ret <4 x float> %xmm
}

Diff Detail

Event Timeline

spatel updated this revision to Diff 58111.May 23 2016, 10:16 AM
spatel retitled this revision from to [x86, AVX] allow explicit calls to VZERO* to modify state in VZeroUpperInserter pass.
spatel updated this object.
spatel added reviewers: RKSimon, aaboud, qcolombet.
spatel added a subscriber: llvm-commits.

Your changes LGTM, Sanjay.

For the example you gave, it looks like a problem in coalescing. To eliminate a copy, coalescing extends a VR256 lifetime across the vzeroupper:

# *** IR Dump After Live Interval Analysis ***:
# Machine code for function avx_in_sse_out: Properties: <Post SSA, tracking liv\
eness, HasVRegs>
Function Live Ins: %YMM0 in %vreg0

0B      BB#0: derived from LLVM BB %0
            Live Ins: %YMM0
16B             %vreg0<def> = COPY %YMM0; VR256:%vreg0
32B             %vreg1<def> = COPY %vreg0:sub_xmm; VR128:%vreg1 VR256:%vreg0
48B             VZEROUPPER
64B             %XMM0<def> = COPY %vreg1; VR128:%vreg1
80B             RET 0, %XMM0

# End machine code for function avx_in_sse_out.

# *** IR Dump After Simple Register Coalescing ***:
# Machine code for function avx_in_sse_out: Properties: <Post SSA, tracking liv\
eness, HasVRegs>
Function Live Ins: %YMM0 in %vreg0

0B      BB#0: derived from LLVM BB %0
            Live Ins: %YMM0
16B             %vreg0<def> = COPY %YMM0; VR256:%vreg0
48B             VZEROUPPER
64B             %XMM0<def> = COPY %vreg0:sub_xmm; VR256:%vreg0
80B             RET 0, %XMM0

Not good! In this case, coalescing would be better off doing this:

# *** IR Dump After Simple Register Coalescing ***:
# Machine code for function avx_in_sse_out: Properties: <Post SSA, tracking liv\
eness, HasVRegs>
Function Live Ins: %YMM0 in %vreg0

0B      BB#0: derived from LLVM BB %0
            Live Ins: %YMM0
16B             %vreg1<def> = COPY %YMM0:sub_xmm; VR128:%vreg1
48B             VZEROUPPER
64B             %XMM0<def> = COPY %vreg1; VR128:%vreg1
80B             RET 0, %XMM0

Thanks, Dave! I'll check this part in and then update https://llvm.org/bugs/show_bug.cgi?id=27823 with your analysis.

This revision was automatically updated to reflect the committed changes.