memcpy has clamped dst stack alignment to NaturalStackAlignment if
hasStackRealignment is false. We should also clamp stack alignment
for memset and memmove. If we don't clamp, SelectionDAG may first
do tail call optimization which requires no stack realignment. Then
memmove, memset in same function may be lowered to load/store with
larger alignment leading to PEI emit stack realignment code which
is absolutely not correct.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/X86/pr42064.ll | ||
---|---|---|
1 | Should we be running tests with/without -stackrealign do you think? |
llvm/test/CodeGen/X86/pr42064.ll | ||
---|---|---|
1 | This is the first commit message. Looks like this tests want rbx to be used as frame pointer. Therefore we should only test -stackrealign. [X86] Defer the creation of LCMPXCHG16B_SAVE_RBX until finalize-isel We need to use LCMPXCHG16B_SAVE_RBX if RBX/EBX is being used as the frame pointer. We previously checked for this during type legalization, but that's too early to know for sure if the base pointer is needed. This patch adds a new pseudo instruction to emit from isel that uses a virtual register for the RBX input. Then we use the custom inserter hook to emit LCMPXCHG16B if RBX isn't needed as a base pointer or LCMPXCHG16B_SAVE_RBX if it is. Fixes PR42064. Reviewed By: pengfei Differential Revision: https://reviews.llvm.org/D88808 |
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp | ||
---|---|---|
7101 | Maybe explain the conflict with tail call optimization in the comments as well. |
llvm/test/CodeGen/X86/memset.ll | ||
---|---|---|
2–4 | do we also need to test "-stackrealign" option to make sure vmovaps is generated? |
llvm/test/CodeGen/X86/memset.ll | ||
---|---|---|
2–4 | We have tested aligned memset in llvm/test/CodeGen/X86/memset-inline.ll. ; RUN: llc < %s -mtriple=x86_64-unknown-linux-gnu -mattr=avx,-avx512f | FileCheck %s --check-prefixes=GPR,AVX define void @aligned_memset_64(ptr align 64 %a, i8 %value) nounwind { ; AVX-LABEL: aligned_memset_64: ; AVX: # %bb.0: ; AVX-NEXT: vmovd %esi, %xmm0 ; AVX-NEXT: vpxor %xmm1, %xmm1, %xmm1 ; AVX-NEXT: vpshufb %xmm1, %xmm0, %xmm0 ; AVX-NEXT: vinsertf128 $1, %xmm0, %ymm0, %ymm0 ; AVX-NEXT: vmovaps %ymm0, 32(%rdi) ; AVX-NEXT: vmovaps %ymm0, (%rdi) ; AVX-NEXT: vzeroupper ; AVX-NEXT: retq tail call void @llvm.memset.inline.p0.i64(ptr align 64 %a, i8 %value, i64 64, i1 0) ret void } |
LGTM. I notice there some comments from Simon and Wei. Pls wait 1 or 2 days in case the comments need to be addressed in the patch.
Maybe explain the conflict with tail call optimization in the comments as well.