Page MenuHomePhabricator

[PowerPC] Implement probing for dynamic stack allocation
ClosedPublic

Authored by lkail on Jun 7 2020, 8:57 PM.

Details

Summary

This patch is part of supporting -fstack-clash-protection. Mainly do such things compared to existing lowerDynamicAlloc

  • Added a new pseudo instruction PPC::PREPARE_PROBED_ALLOC to get actual frame pointer and final stack pointer.
  • Synthesize a loop to probe by blocks.
  • Use DYNAREAOFFSET to get MaxCallFrameSize which is calculated in prologepilog.

Diff Detail

Unit TestsFailed

TimeTest
10,140 mslinux > libomp.env::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/env/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && env KMP_DISP_NUM_BUFFERS=0 /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/env/Output/kmp_set_dispatch_buf.c.tmp
1,770 mslinux > libomp.worksharing/for::Unknown Unit Message ("")
Script: -- : 'RUN: at line 1'; /mnt/disks/ssd0/agent/llvm-project/build/./bin/clang -fopenmp -pthread -fno-experimental-isel -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test -I /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/src -L /mnt/disks/ssd0/agent/llvm-project/build/lib -I /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/ompt /mnt/disks/ssd0/agent/llvm-project/openmp/runtime/test/worksharing/for/kmp_set_dispatch_buf.c -o /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp -lm -latomic && /mnt/disks/ssd0/agent/llvm-project/build/projects/openmp/runtime/test/worksharing/for/Output/kmp_set_dispatch_buf.c.tmp 7

Event Timeline

lkail created this revision.Jun 7 2020, 8:57 PM
lkail updated this revision to Diff 269102.Jun 7 2020, 9:02 PM
lkail edited the summary of this revision. (Show Details)Jun 7 2020, 9:05 PM

Some minor comments.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
11746

Can we make it as unsigned integer ?

11775

Adding some comments to indicate what the CFG looks like will make the code more clear.

11829

Can we use the CTR loop ?

llvm/test/CodeGen/PowerPC/stack-clash-dynamic-alloca.ll
4

Check the run for pwr9

lkail updated this revision to Diff 272328.Jun 21 2020, 8:52 PM
lkail marked an inline comment as done.

Rebased and address steven's comments.

lkail added inline comments.Jun 21 2020, 8:53 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
11746

Looks getAsInteger already does it.

template <typename T>
std::enable_if_t<!std::numeric_limits<T>::is_signed, bool>
getAsInteger(unsigned Radix, T &Result) const;
lkail marked an inline comment as done.Jun 23 2020, 7:00 PM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
11829

Yes, we can, but it looks not worth the effort, since the trip count can be zero, we have to add additional code to check this case.

lkail marked an inline comment as done.Jun 23 2020, 7:01 PM
jsji added inline comments.Jun 25 2020, 3:25 PM
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
7921

Meaningless comment.

7923

Meaningless comment.

11741

Can we set the size to sys::Process::getPageSizeEstimate()? Typical systems are all giving me 65536, are we probing too much if we set it to 4096 here?

11749

What happen with "stack-probe-size"="65536"?

The imm is known value, if we can NOT load an imm > 32768 with one instruction, we should be able to load it in two instructions. Why we have to assert here?

11750

What about "stack-probe-size"="0"? When user would like to opt-out probe for some specific function? See https://reviews.llvm.org/rL225360`. We will now generate code divided by 0...

Also what about "stack-probe-size"="1"? I think we should also round the size to stack alignment similar to Z.

11754

I don't quite understand this. Why we can NOT do this similar to x86 or SystemZ? using PHI nodes.

11770

LLVM_BB ? Why LLVM? Can we use more meaningful name?

11788

calculate?

lkail updated this revision to Diff 273990.Jun 28 2020, 8:12 PM

Address @jsji 's comments. Add test case for large probe size.

lkail marked 4 inline comments as done.Jun 28 2020, 8:13 PM
lkail marked 2 inline comments as done.Jun 28 2020, 8:20 PM
lkail added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
11741

Yes, sysconf gives 65536 on my rhel POWER machine and 4096 on my rhel X86 machine. I'm quite concerned if users do cross compilation if the host has large page size than the target.

11754

The comment to compare with PPCRegisterInfo::lowerDynamicAlloc might be a little bit misleading. My point is PPCTargetLowering::emitProbedAlloca is unable to expaned in prologepilog like PPCRegisterInfo::lowerDynamicAlloc. Should I delete this comparison?

Looks we don't need phi nodes in the loop, only r1, which is a physical register, is updated.

jsji accepted this revision as: jsji.Jun 30 2020, 1:39 PM

LGTM.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
11741

How about we introduce an option for user to switch the default value here, eg: default is still the smallest, 4096, but we provide another choice to use system default (65536 most of the time) .

11754

yeah, maybe remove the comments comparing to lowerDynamicAlloc.

llvm/lib/Target/PowerPC/PPCRegisterInfo.cpp
614

Meaningless comments.

This revision is now accepted and ready to land.Jun 30 2020, 1:39 PM
This revision was automatically updated to reflect the committed changes.