This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] Clamp stack alignment for memset, memmove
ClosedPublic

Authored by HaohaiWen on Oct 21 2022, 8:24 AM.

Details

Summary

memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.

Diff Detail

Event Timeline

HaohaiWen created this revision.Oct 21 2022, 8:24 AM
HaohaiWen requested review of this revision.Oct 21 2022, 8:24 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 8:24 AM
RKSimon added inline comments.Oct 21 2022, 8:44 AM
llvm/test/CodeGen/X86/pr42064.ll
1

Should we be running tests with/without -stackrealign do you think?

Fix amdgpu test

HaohaiWen added inline comments.Oct 21 2022, 11:38 PM
llvm/test/CodeGen/X86/pr42064.ll
1

This is the first commit message. Looks like this tests want rbx to be used as frame pointer. Therefore we should only test -stackrealign.

[X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel

We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as
the frame pointer. We previously checked for this during type
legalization, but that's too early to know for sure if the base
pointer is needed.

This patch adds a new pseudo instruction to emit from isel that
uses a virtual register for the RBX input. Then we use the custom
inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base
pointer or LCMPXCHG16B_SAVE_RBX if it is.

Fixes PR42064.

Reviewed By: pengfei

Differential Revision: https://reviews.llvm.org/D88808
LuoYuanke added inline comments.Oct 22 2022, 6:51 PM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
7101

Maybe explain the conflict with tail call optimization in the comments as well.

HaohaiWen updated this revision to Diff 470029.Oct 23 2022, 6:59 PM

Update comments

HaohaiWen updated this revision to Diff 470030.Oct 23 2022, 7:02 PM

Delete mistakenly added files

HaohaiWen marked an inline comment as done.Oct 23 2022, 7:03 PM
wxiao3 added inline comments.Oct 23 2022, 11:02 PM
llvm/test/CodeGen/X86/memset.ll
2–4

do we also need to test "-stackrealign" option to make sure vmovaps is generated?

HaohaiWen added inline comments.Oct 24 2022, 12:00 AM
llvm/test/CodeGen/X86/memset.ll
2–4

We have tested aligned memset in llvm/test/CodeGen/X86/memset-inline.ll.

; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=avx,-avx512f | FileCheck %s --check-prefixes=GPR,AVX

define void @aligned_memset_64(ptr align 64 %a, i8 %value) nounwind {
; AVX-LABEL: aligned_memset_64:
; AVX:       # %bb.0:
; AVX-NEXT:    vmovd %esi, %xmm0
; AVX-NEXT:    vpxor %xmm1, %xmm1, %xmm1
; AVX-NEXT:    vpshufb %xmm1, %xmm0, %xmm0
; AVX-NEXT:    vinsertf128 $1, %xmm0, %ymm0, %ymm0
; AVX-NEXT:    vmovaps %ymm0, 32(%rdi)
; AVX-NEXT:    vmovaps %ymm0, (%rdi)
; AVX-NEXT:    vzeroupper
; AVX-NEXT:    retq
  tail call void @llvm.memset.inline.p0.i64(ptr align 64 %a, i8 %value, i64 64, i1 0)
  ret void
}
LuoYuanke accepted this revision.Oct 24 2022, 6:25 PM

LGTM. I notice there some comments from Simon and Wei. Pls wait 1 or 2 days in case the comments need to be addressed in the patch.

This revision is now accepted and ready to land.Oct 24 2022, 6:25 PM
This revision was landed with ongoing or failed builds.Oct 26 2022, 1:45 AM
This revision was automatically updated to reflect the committed changes.