This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/
-
ReleaseNotes.rst
-
lib/
-
Basic/Targets/
-
Targets/
-
SystemZ.h
-
Driver/ToolChains/
-
ToolChains/
-
Clang.cpp
-
test/
-
CodeGen/
-
stack-clash-protection.c
-
Driver/
-
stack-clash-protection-02.c
-
llvm/
-
include/llvm/ADT/
-
llvm/
-
ADT/
-
Triple.h
-
lib/Target/SystemZ/
-
Target/
-
SystemZ/
-
SystemZFrameLowering.h
6/8
SystemZFrameLowering.cpp
-
SystemZISelLowering.h
1/1
SystemZISelLowering.cpp
-
SystemZInstrInfo.h
-
SystemZInstrInfo.cpp
3/3
SystemZInstrInfo.td
-
SystemZOperators.td
-
test/CodeGen/SystemZ/
-
CodeGen/
-
SystemZ/
-
stack-clash-dynamic-alloca.ll
-
stack-clash-protection.ll

Differential D78717

[SystemZ] Implement -fstack-clash-protection
ClosedPublic

Authored by jonpa on Apr 23 2020, 8:04 AM.

Download Raw Diff

Details

Reviewers

uweigand

Commits

rG515bfc66eace: [SystemZ] Implement -fstack-clash-protection

Summary

So far, a first attempt at implementing this by looking at gcc and also the X86 llvm backend. Prologue and dynamic allocas handled - some questions remain:

Prologue:
- GCC emits an 'lgr %r15,%r1' after the loop, which seems redundant, since it is known that %r15 has the value of %r1 already. Is this required to exist for some reason (omitted by patch for now)?
- gcc seems not to be probing the residual allocation after the loop. However if only two (unrolled) allocations were made, the residual is also probed.
- I am not aware of any real reason to not simply do the probing directly in emitPrologue(), but it seems wisest to do like X86 since inlineStackProbe() is called from common-code. Perhaps this relates to implementing shrink-wrapping or other things?
- emitBlockAfter(), splitBlockBefore() copied from SystemZISelLowering. Make into SysteZInstrInfo members instead?
- A little unsure about the use of unsigned vs uint64_t...

Dynamic allocas:
- I took the X86 tests and copied them over as SystemZ tests and noticed that SystemZ gets these test cases built by SelectionDAGBuilder with dynamic_stackalloc nodes, while X86 seem to get these (constant) allocas merged into the stack frame. This is true also without this patch, but I am not sure why. In this case it seems even more preferred to avoid the dynamic_stackalloca nodes whenever possible...
- With dynamic allocas, it seems wise to always probe no matter what the size, but the "tail" in emitProbedAlloca() is not probed. This seems flawed to me:

First of all, there could be multiple dynamic allocas in a function and if they all are less than the ProbeSize a huge span could be built up without any probing:

Tail1 Tail2 Tail3 Tail4   -->
|     |     |     |    |
          GGGGGGGGG

Then I am also worried about exiting the loop and allocating the remainder since only the topmost word in each allocated block is probed. If the guard page lies very close to that, and the remainder is relatively big, the bottom of the stack could end up way past the guard page:

Block1  Block2  Tail       -->
|P      |P      |      |
          GGGGGGGGG

P = Probe, G = Guard page

This looks bad to me, but I really don't know - is this perhaps considered harmless for some reason?

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jonpa created this revision.Apr 23 2020, 8:04 AM

Herald added subscribers: dexonsmith, hiraditya, kristof.beyls. · View Herald TranscriptApr 23 2020, 8:04 AM

Not looking at the code so far, but just answering your questions:

GCC emits an 'lgr %r15,%r1' after the loop, which seems redundant, since it is known that %r15 has the value of %r1 already. Is this required to exist for some reason (omitted by patch for now)?

I agree that this looks redundant.

gcc seems not to be probing the residual allocation after the loop. However if only two (unrolled) allocations were made, the residual is also probed.

I'm not seeing this, do you have an example?

I am not aware of any real reason to not simply do the probing directly in emitPrologue(), but it seems wisest to do like X86 since inlineStackProbe() is called from common-code. Perhaps this relates to implementing shrink-wrapping or other things?

The comment you're adding says:

// stack probing may involve looping, and control flow generations is
// disallowed at this point. Rely to later processing through
// `inlineStackProbe`.

That would seem to be the reason why it cannot be done directly inline, right?

emitBlockAfter(), splitBlockBefore() copied from SystemZISelLowering. Make into SysteZInstrInfo members instead?

Makes sense to me.

A little unsure about the use of unsigned vs uint64_t...

Huh. I'd probably just use uint64_t everywhere.

I took the X86 tests and copied them over as SystemZ tests and noticed that SystemZ gets these test cases built by SelectionDAGBuilder with dynamic_stackalloc nodes, while X86 seem to get these (constant) allocas merged into the stack frame. This is true also without this patch, but I am not sure why. In this case it seems even more preferred to avoid the dynamic_stackalloca nodes whenever possible...

That's strange. Not sure why we should be different from X86 here. Can you investigate?

With dynamic allocas, it seems wise to always probe no matter what the size, but the "tail" in emitProbedAlloca() is not probed. This seems flawed to me:

I agree, we need to always probe for dynamic allocas as far as I can see. GCC does this as well.

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2020, 8:48 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

gcc seems not to be probing the residual allocation after the loop. However if only two (unrolled) allocations were made, the residual is also probed.

I'm not seeing this, do you have an example?

void large_stack() {
  volatile int stack[2000], i;
  for (i = 0; i < sizeof(stack) / sizeof(int); ++i)
    stack[i] = i;
}

With stack[2000], I see

aghi    %r15,-4096
cg      %r0,4088(%r15)
aghi    %r15,-4072
cg      %r0,4064(%r15)

With stack[8000], I don't see a probe after the loop...

I am not aware of any real reason to not simply do the probing directly in emitPrologue(), but it seems wisest to do like X86 since inlineStackProbe() is called from common-code. Perhaps this relates to implementing shrink-wrapping or other things?

The comment you're adding says:
// stack probing may involve looping, and control flow generations is
// disallowed at this point. Rely to later processing through
// `inlineStackProbe`.
That would seem to be the reason why it cannot be done directly inline, right?

Yes, but I can't see anything other than that comment that (at least currently) demands this - the only use for both emitPrologue() and inlineStackProbe() are right after each other in PEI::insertPrologEpilogCode().

I took the X86 tests and copied them over as SystemZ tests and noticed that SystemZ gets these test cases built by SelectionDAGBuilder with dynamic_stackalloc nodes, while X86 seem to get these (constant) allocas merged into the stack frame. This is true also without this patch, but I am not sure why. In this case it seems even more preferred to avoid the dynamic_stackalloca nodes whenever possible...

That's strange. Not sure why we should be different from X86 here. Can you investigate?

This was due to SystemZ having StackRealignable set to false, and the allocas of those test cases were aligned to 16. Removing the alignments in the tests fixed it (see FunctionLoweringInfo.cpp:143).

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

I am a little unsure of how performance critical this is... It seems that the handling of inlineStackProbe with a compile-time constant stack size is an optimization of the general case of an unknown size. It seems to be rather important, since the loop is unrolled if only a few iterations are required.

I think we could emit a pseudo from emitProbedAlloca() and handle that in SystemZFrameLowering instead in the same place, if we would allow the case of a constant size be loaded into a register before the loop. The loop itself might also have one extra instruction or so, but the common case would be to unroll that loop...

Can R0D and R1D be assumed to be available in the prologue at this point? (how about anyreg-callconv?)

tstellar added a subscriber: serge-sans-paille.May 1 2020, 5:10 PM

tstellar added a subscriber: tstellar.

In D78717#2014532, @jonpa wrote:
gcc seems not to be probing the residual allocation after the loop. However if only two (unrolled) allocations were made, the residual is also probed.

I'm not seeing this, do you have an example?
void large_stack() {
  volatile int stack[2000], i;
  for (i = 0; i < sizeof(stack) / sizeof(int); ++i)
    stack[i] = i;
}
With stack[2000], I see
aghi    %r15,-4096
cg      %r0,4088(%r15)
aghi    %r15,-4072
cg      %r0,4064(%r15)
With stack[8000], I don't see a probe after the loop...

Well, I see it:

        lay     %r1,-28672(%r15)
.L5:
        lay     %r15,-4096(%r15)
        cg      %r0,4088(%r15)
        clgr    %r15,%r1
        jh      .L5
        lgr     %r15,%r1
        lay     %r15,-3496(%r15)
        cg      %r0,3488(%r15)

There's one cg in the loop, and this last cg for the tail. That's with GCC 9, not sure if that changed recently.

I am not aware of any real reason to not simply do the probing directly in emitPrologue(), but it seems wisest to do like X86 since inlineStackProbe() is called from common-code. Perhaps this relates to implementing shrink-wrapping or other things?

The comment you're adding says:
// stack probing may involve looping, and control flow generations is
// disallowed at this point. Rely to later processing through
// `inlineStackProbe`.
That would seem to be the reason why it cannot be done directly inline, right?
Yes, but I can't see anything other than that comment that (at least currently) demands this - the only use for both emitPrologue() and inlineStackProbe() are right after each other in PEI::insertPrologEpilogCode().

Hmm, not sure either. But either the comment is correct or not; if it is, we need this split, if it isn't, the comment should be removed :-) Can you ask who added that callback for Intel why they thought this was necessary?

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

I am a little unsure of how performance critical this is... It seems that the handling of inlineStackProbe with a compile-time constant stack size is an optimization of the general case of an unknown size. It seems to be rather important, since the loop is unrolled if only a few iterations are required.

I think we could emit a pseudo from emitProbedAlloca() and handle that in SystemZFrameLowering instead in the same place, if we would allow the case of a constant size be loaded into a register before the loop. The loop itself might also have one extra instruction or so, but the common case would be to unroll that loop...

Hmm, there's probably enough differences that may make it not worth spending effort here, given that you've already implemented it now.

Can R0D and R1D be assumed to be available in the prologue at this point? (how about anyreg-callconv?)

Yes. See SystemZCallingConv.td:

// "All registers" as used by the AnyReg calling convention.
// Note that registers 0 and 1 are still defined as intra-call scratch
// registers that may be clobbered e.g. by PLT stubs.

With dynamic allocas, it seems wise to always probe no matter what the size, but the "tail" in emitProbedAlloca() is not probed. This seems flawed to me:

I just had a look to the X86 and indeed, it's a falw, I'll propose a patch.

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

I'll explore that aspect too, thanks for the code review :-)

In D78717#2018725, @serge-sans-paille wrote:

With dynamic allocas, it seems wise to always probe no matter what the size, but the "tail" in emitProbedAlloca() is not probed. This seems flawed to me:

I just had a look to the X86 and indeed, it's a falw, I'll propose a patch.

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

I'll explore that aspect too, thanks for the code review :-)

Ah, thanks for clarifying this :-)

It would also be useful if you could explain the comment you added "stack probing may involve looping, and control flow generations is disallowed at this point. Rely to later processing through inlineStackProbe". Is this still true? It probably is, I just can't see why this is needed since emitPrologue() and inlineStackProbe() are run nearly right after eachother in PEI::insertPrologEpilogCode().

@serge-sans-paille: Last question was for you, which I forgot to write...

In D78717#2020345, @jonpa wrote:

@serge-sans-paille: Last question was for you, which I forgot to write...

Yeah, I was busy with another bug today, I'll probably take the time to review that tonight / tomorrow. Thanks for the heads up!

As a general note, the code duplication between inlineStackProbe and emitProbedAlloca is a bit unfortunate, not sure if there's a better way here.

I'll explore that aspect too, thanks for the code review :-)

At first glance, the code is indeed very similar, to the exception of the handling of the tail (no probing needed for the tail in inlineStackProbe, but one is required for emitProbedAlloca) and the fact that the Size of the alloca is a constant in one case, and not in the other case. But it looks like some extra argument could do the trick.

It would also be useful if you could explain the comment you added "stack probing may involve looping, and control flow generations is disallowed at this point. Rely to later processing through inlineStackProbe". Is this still true? It probably is, I just can't see why this is needed since emitPrologue() and inlineStackProbe() are run nearly right after eachother in PEI::insertPrologEpilogCode().

It's clearer if I just state that we're reusing the existing probe infrastructure. I'm not a MachineInstruction expert and don't recall why I wrote this :-/

It would also be useful if you could explain the comment you added "stack probing may involve looping, and control flow generations is disallowed at this point. Rely to later processing through inlineStackProbe". Is this still true? It probably is, I just can't see why this is needed since emitPrologue() and inlineStackProbe() are run nearly right after eachother in PEI::insertPrologEpilogCode().

It's clearer if I just state that we're reusing the existing probe infrastructure. I'm not a MachineInstruction expert and don't recall why I wrote this :-/

Ah, I see. I suppose then that it is sort of "voluntary" for a target to take the trouble of using the stub with metadata, depending on if it would simplify things? The original comment by Andy Ayers in his commit message from 2015 ( 809cbe9) formulates this in a weaker way: "to avoid complications". I am not sure exactly what complications that might arise (in the case where an MBB is both a SaveBlock and RestoreBlock the particular way of splitting may get more complicated, perhaps?), but perhaps it would avoid confusion if you changed your comment to use Andys wording instead...

@AndyAyers : Do you recall your reasons from back then?

There's one cg in the loop, and this last cg for the tail. That's with GCC 9, not sure if that changed recently.

huh - I also see it with gcc 9.2.1, but not with a more recent version (20200425)...

Backend part updated.

Added probing of the tail of dyn-alloca, with an extra check for zero tail, roughly like GCC. It seems it can be assumed that the alloca size will always be a multiple of 8 bytes, or? I think that is necessary, or there might be final probe partially below SP (if tail is e.g. 4 bytes). Also, if the stack-probe size is 0, an infinite loop would result (assert?), but I suppose that would always be noticeable.

(re)compute live-ins for the new blocks created in inlineStackProbe() (it seems this is not needed for the probed alloca case). The generated code from inlineStackProbe() does roughly the same as GCC. We could use a brct for the loop instead, or we could try to keep it at 12 or more instructions using multiple exits (like "forced" unrolling), but I suppose maybe the unrolling is the common case, so the loop isn't that important?

Tests: Not sure if we need all the X86 test cases - for instance stack-clash-medium-natural-probes.ll seems to generate nearly the same code as stack-clash-medium.ll. stack-clash-unknown-call.ll: Changed the called function from memset to something not known, since the point of the test case seemed to be to have a call in the function, which however did not work at first since an XC loop resulted.

Moved the utility functions to SystemZInstrInfo.cpp as agreed before.

Huh. I'd probably just use uint64_t everywhere.

I think what confused me a bit was that MCCFIInstruction::createDefCfaOffset() takes an 'int' as argument, but I suppose it's not the end of the world if the CFA offset is possibly broken if the stack frame size ever became greater than -INT32_MIN...?

In D78717#2022095, @jonpa wrote:

It would also be useful if you could explain the comment you added "stack probing may involve looping, and control flow generations is disallowed at this point. Rely to later processing through inlineStackProbe". Is this still true? It probably is, I just can't see why this is needed since emitPrologue() and inlineStackProbe() are run nearly right after eachother in PEI::insertPrologEpilogCode().

It's clearer if I just state that we're reusing the existing probe infrastructure. I'm not a MachineInstruction expert and don't recall why I wrote this :-/

Ah, I see. I suppose then that it is sort of "voluntary" for a target to take the trouble of using the stub with metadata, depending on if it would simplify things? The original comment by Andy Ayers in his commit message from 2015 ( 809cbe9) formulates this in a weaker way: "to avoid complications". I am not sure exactly what complications that might arise (in the case where an MBB is both a SaveBlock and RestoreBlock the particular way of splitting may get more complicated, perhaps?), but perhaps it would avoid confusion if you changed your comment to use Andys wording instead...

@AndyAyers : Do you recall your reasons from back then?

Unfortunately, no. I did a quick look for notes I might have made at the time and didn't find any.

In D78717#2025239, @jonpa wrote:

Added probing of the tail of dyn-alloca, with an extra check for zero tail, roughly like GCC. It seems it can be assumed that the alloca size will always be a multiple of 8 bytes, or? I think that is necessary, or there might be final probe partially below SP (if tail is e.g. 4 bytes). Also, if the stack-probe size is 0, an infinite loop would result (assert?), but I suppose that would always be noticeable.

The size argument to DYNAMIC_STACKALLOC is indeed always a multiple of 8 (as documented in ISDOpcodes.h).

As to the stack probe size, I think must indeed ensure that this size is also a multiple of 8 bytes (the stack alignment requirement) and is nonzero. If the given value doesn't fulfil those requirements, I guess we should round it down to the stack alignment requirement, and if the result is zero, use the stack alginment requirement instead.

(re)compute live-ins for the new blocks created in inlineStackProbe() (it seems this is not needed for the probed alloca case). The generated code from inlineStackProbe() does roughly the same as GCC. We could use a brct for the loop instead, or we could try to keep it at 12 or more instructions using multiple exits (like "forced" unrolling), but I suppose maybe the unrolling is the common case, so the loop isn't that important?

I think just having a plain loop is OK for now. For any future optimizations, we'd have to do performance evaluation first.

Tests: Not sure if we need all the X86 test cases - for instance stack-clash-medium-natural-probes.ll seems to generate nearly the same code as stack-clash-medium.ll.

It seems this was added in preparation for exploiting "natural probes" -- if the code can be proven to already touch all (or at least some) stack pages at least once (in this test case via the stores), then we might optimize out some of the extra probes. But apparently this wasn't implemented in the X86 back-end in the end, and neither is it in your patch, so it doesn't make sense to have a test for it.

stack-clash-unknown-call.ll: Changed the called function from memset to something not known, since the point of the test case seemed to be to have a call in the function, which however did not work at first since an XC loop resulted.

I believe this is also related to natural probes: on *Intel*, every call instruction would be a natural probe as it pushes the return address to the stack. But on Z, the call instruction doesn't touch the stack (the called function might, but it also might not, so we cannot really rely on it).

Moved the utility functions to SystemZInstrInfo.cpp as agreed before.

OK.

Huh. I'd probably just use uint64_t everywhere.

I think what confused me a bit was that MCCFIInstruction::createDefCfaOffset() takes an 'int' as argument, but I suppose it's not the end of the world if the CFA offset is possibly broken if the stack frame size ever became greater than -INT32_MIN...?

Hmm. In theory, those DWARF offset values are encoded as ULEB128, so there's no reason to constrain them to an "int". If this ever becomes an issue, we should update MCCFIInstruction to use int64_t instead. For now, that's probably not a real concern; I'm sure there's other code that doesn't handle stack sizes > 2GB correctly. (E.g. do we even update the stack pointer correctly in this case?)

As to the stack probe size, I think must indeed ensure that this size is also a multiple of 8 bytes (the stack alignment requirement) and is nonzero. If the given value doesn't fulfil those requirements, I guess we should round it down to the stack alignment requirement, and if the result is zero, use the stack alginment requirement instead.

I implemented this in getStackProbeSize() with a few new tests.

I think just having a plain loop is OK for now. For any future optimizations, we'd have to do performance evaluation first.

I built and ran SPEC'17 once and it looked like there were no sign of regressions, at most a single percent...

It seems this was added in preparation for exploiting "natural probes" -- if the code can be proven to already touch all (or at least some) stack pages at least once (in this test case via the stores), then we might optimize out some of the extra probes. But apparently this wasn't implemented in the X86 back-end in the end, and neither is it in your patch, so it doesn't make sense to have a test for it.

Ah, now I get it. Removed those tests.

I believe this is also related to natural probes: on *Intel*, every call instruction would be a natural probe as it pushes the return address to the stack. But on Z, the call instruction doesn't touch the stack (the called function might, but it also might not, so we cannot really rely on it).

removed

Merged remaining X86 "small/medium/large" into stack-clash-protection.ll.

It seems this was added in preparation for exploiting "natural probes" -- if the code can be proven to already touch all (or at least some) stack pages at least once (in this test case via the stores), then we might optimize out some of the extra probes. But apparently this wasn't implemented in the X86 back-end in the end, and neither is it in your patch, so it doesn't make sense to have a test for it.

Correct!

I believe this is also related to natural probes: on *Intel*, every call instruction would be a natural probe as it pushes the return address to the stack. But on Z, the call instruction doesn't touch the stack (the called function might, but it also might not, so we cannot really rely on it).

Confirmed!

I experimented with inserting the probing directly in emitPrologue() instead of building the call with metadata, but then came back to the case with a single block function. Looking closer at this, it seems that this could be the motivation behind using the call/metadata stub. Splitting the Prologue/Epilogue block to insert the loop would cause the RestoreBlocks set of PEI to be incorrect, since the epilogue block has changed from MBB to DoneMBB (block after the loop / second half of original MBB).

This could theoretically be fixed by rearranging things so that MBB ended up to have the same instructions and the same place as DoneMBB now gets, but that seems impractical and besides that would break the SaveBlocks set of PEI... (I think perhaps a comment explaining this would be nice.)

There is an assert for the metadata attached to the call instruction, which kind of guards against users defining a function with the same name. Normally, I would have expected a pseudo MachineInstruction to be built here with an immediate operand for the size requirement...

In D78717#2036041, @jonpa wrote:

Normally, I would have expected a pseudo MachineInstruction to be built here with an immediate operand for the size requirement...

That sounds very nice. If you go that way I'll happily update the X86 code accordingly. Or If you point me to some example / doc / code on pseudo MachineInstruction, I can implement that too.

serge-sans-paille added inline comments.May 15 2020, 6:44 AM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
671	Maybe we should syndicate that unroll parameter somewhere across architectures?
709	On X86, we don't need to probe that final allocation because the two ways the stack could grow after the final alloca are A function call, and in that case we get a free probe when we make a function call A PROBED_ALLOCA, and in that case we get a probe at Residual + PAGE_SIZE, which is right into the Page Guard. I assume that's different for SystemZ?

jonpa marked 4 inline comments as done.May 18 2020, 2:31 AM

jonpa added inline comments.

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
671	I am not sure I see the benefit of that since this entire method is already defined by the target.
709	There's not always a free probe with a function call on SystemZ (only if stack space is allocated by the called function), but regardless of that I think the residual is needed. Since the probe is done on the topmost byte of each allocated block on SystemZ, the guard page could fit within the last allocated full block and a residual of just 8 or more bytes (probing the high byte is what GCC is doing on SystemZ). I still wonder if that would not be necessary also on X86 in case "2": Given that you on X86 probe the lower byte of each full block, you could get a residual into the guard page, and then if the dynamic alloca started with a full block, the next probe would not be into the guard page, but past it: \| \| \|GGGGG\| \| \| P\| R\| P\| Maybe I am missing something?

In D78717#2038456, @serge-sans-paille wrote:

In D78717#2036041, @jonpa wrote:

Normally, I would have expected a pseudo MachineInstruction to be built here with an immediate operand for the size requirement...

That sounds very nice. If you go that way I'll happily update the X86 code accordingly. Or If you point me to some example / doc / code on pseudo MachineInstruction, I can implement that too.

@uweigand : Would you rather use a target pseudo instruction for this rather than using a call with metadata?

In D78717#2041281, @jonpa wrote:

In D78717#2038456, @serge-sans-paille wrote:

In D78717#2036041, @jonpa wrote:

Normally, I would have expected a pseudo MachineInstruction to be built here with an immediate operand for the size requirement...

That sounds very nice. If you go that way I'll happily update the X86 code accordingly. Or If you point me to some example / doc / code on pseudo MachineInstruction, I can implement that too.

@uweigand : Would you rather use a target pseudo instruction for this rather than using a call with metadata?

Using a pseudo is probably a cleaner solution, yes.

Use a pseudo instead of call with metadata.

@serge-sans-paille:

That sounds very nice. If you go that way I'll happily update the X86 code accordingly. Or If you point me to some example / doc / code on pseudo MachineInstruction, I can implement that too.

This patch has now been changed to use a pseudo instead, so you can see how that works here...

Also, I wonder what you think about my previous question to you about the probing on X86 ..?

lkail added a subscriber: lkail.May 25 2020, 5:47 AM

serge-sans-paille added inline comments.May 25 2020, 6:08 AM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
709	I took my pen and paper, and you're definitively right. Fortunately, there's the possibility of always probing the upper byte when doing a dynamic alloca, so that we always avoid this extra probe. That way the common case (alloc_size < PAGE_SIZE) remains costless. Correct?

jonpa marked an inline comment as done.May 27 2020, 12:57 AM

jonpa added inline comments.

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
709	IIUC, the topmost byte of each allocated stack range is always naturally probed before the entry of the prologue by the push of the return address by the call. The lowermost byte is then probed in each full block, but the residual needs no probing. In order get a matching behavior with dynamic allocas, the topmost byte should always be probed, as well as the lowermost byte of each full block. So yes, I think that would be correct as long as there are no other allocations of stack space anywhere...

serge-sans-paille added inline comments.Jun 3 2020, 3:17 AM

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
625	could be: auto Where = llvm::find_if(PrologMBB, [](MachineInstr& MI) { return MI.getOpcode() == SystemZ::STACKALLOC_W_PROBING;}); if(Where = PrologMBB.end()) return; MachineInstr &StackAllocMI = *Where;

Patch rebased (using cfiDefCfaOffset() instead of createDefCfaOffset()).

Check for a free probe (STMG) in prologue in which case probing is not done when the space between that STMG and the new stack pointer is less than the probe size. This saves the vast majority of the probing (removes 95% of the CG instructions and 90% of the number of files changed).

A few cosmetic issues mentioned inline, otherwise this now looks good to me. Thanks!

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp
409	I believe this will now fit onto a single line.
llvm/lib/Target/SystemZ/SystemZISelLowering.cpp
6860	The RHS can be simplified to ~(StackAlign - 1)
llvm/lib/Target/SystemZ/SystemZInstrInfo.td
38	Doesn't this also need Defs and Uses to be fully correct?
39	For consistency with the PROBED_ALLOCA name, maybe this should be called PROBED_STACKALLOC?

This revision is now accepted and ready to land.Jun 5 2020, 8:26 AM

Thanks for review - committed as 515bfc6 after last updates.

llvm/lib/Target/SystemZ/SystemZInstrInfo.td
38	Ah, yes I suppose that might as well be there... I added the defs and uses and also the side-effects flag since it may expand into a loop.

Closed by commit rG515bfc66eace: [SystemZ] Implement -fstack-clash-protection (authored by jonpa). · Explain WhyJun 6 2020, 10:08 AM

This revision was automatically updated to reflect the committed changes.

jonpa marked an inline comment as done.

Herald added a project: Restricted Project. · View Herald TranscriptJun 6 2020, 10:08 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Revision Contents

Path

Size

clang/

docs/

ReleaseNotes.rst

4 lines

lib/

Basic/

Targets/

SystemZ.h

4 lines

Driver/

ToolChains/

Clang.cpp

2 lines

test/

CodeGen/

stack-clash-protection.c

1 line

Driver/

stack-clash-protection-02.c

13 lines

llvm/

include/

llvm/

ADT/

Triple.h

5 lines

lib/

Target/

SystemZ/

SystemZFrameLowering.h

2 lines

SystemZFrameLowering.cpp

145 lines

SystemZISelLowering.h

9 lines

SystemZISelLowering.cpp

207 lines

10 lines

24 lines

9 lines

6 lines

test/

CodeGen/

SystemZ/

stack-clash-dynamic-alloca.ll

136 lines

stack-clash-protection.ll

242 lines

Diff 269023

clang/docs/ReleaseNotes.rst

	Show First 20 Lines • Show All 88 Lines • ▼ Show 20 Lines
	- Clang's profile files generated through ``-fprofile-instr-generate`` are using			- Clang's profile files generated through ``-fprofile-instr-generate`` are using
	a fixed hashing algorithm that prevents some collision when loading			a fixed hashing algorithm that prevents some collision when loading
	out-of-date profile informations. Clang can still read old profile files.			out-of-date profile informations. Clang can still read old profile files.

	New Compiler Flags			New Compiler Flags
	------------------			------------------

	- -fstack-clash-protection will provide a protection against the stack clash			- -fstack-clash-protection will provide a protection against the stack clash
	attack for x86 architecture through automatic probing of each page of			attack for x86 and s390x architectures through automatic probing of each page
	allocated stack.			of allocated stack.

	- -ffp-exception-behavior={ignore,maytrap,strict} allows the user to specify			- -ffp-exception-behavior={ignore,maytrap,strict} allows the user to specify
	the floating-point exception behavior. The default setting is ``ignore``.			the floating-point exception behavior. The default setting is ``ignore``.

	- -ffp-model={precise,strict,fast} provides the user an umbrella option to			- -ffp-model={precise,strict,fast} provides the user an umbrella option to
	simplify access to the many single purpose floating point options. The default			simplify access to the many single purpose floating point options. The default
	setting is ``precise``.			setting is ``precise``.

	▲ Show 20 Lines • Show All 338 Lines • Show Last 20 Lines

clang/lib/Basic/Targets/SystemZ.h

Show First 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:

ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {		ArrayRef<TargetInfo::GCCRegAlias> getGCCRegAliases() const override {
// No aliases.		// No aliases.
return None;		return None;
}		}

ArrayRef<TargetInfo::AddlRegName> getGCCAddlRegNames() const override;		ArrayRef<TargetInfo::AddlRegName> getGCCAddlRegNames() const override;

		bool isSPRegName(StringRef RegName) const override {
		return RegName.equals("r15");
		}

bool validateAsmConstraint(const char *&Name,		bool validateAsmConstraint(const char *&Name,
TargetInfo::ConstraintInfo &info) const override;		TargetInfo::ConstraintInfo &info) const override;

const char *getClobbers() const override {		const char *getClobbers() const override {
// FIXME: Is this really right?		// FIXME: Is this really right?
return "";		return "";
}		}

▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

clang/lib/Driver/ToolChains/Clang.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

	Show First 20 Lines • Show All 2,991 Lines • ▼ Show 20 Lines

	static void RenderSCPOptions(const ToolChain &TC, const ArgList &Args,			static void RenderSCPOptions(const ToolChain &TC, const ArgList &Args,
	ArgStringList &CmdArgs) {			ArgStringList &CmdArgs) {
	const llvm::Triple &EffectiveTriple = TC.getEffectiveTriple();			const llvm::Triple &EffectiveTriple = TC.getEffectiveTriple();

	if (!EffectiveTriple.isOSLinux())			if (!EffectiveTriple.isOSLinux())
	return;			return;

	if (!EffectiveTriple.isX86())			if (!EffectiveTriple.isX86() && !EffectiveTriple.isSystemZ())
	return;			return;

	if (Args.hasFlag(options::OPT_fstack_clash_protection,			if (Args.hasFlag(options::OPT_fstack_clash_protection,
	options::OPT_fnostack_clash_protection, false))			options::OPT_fnostack_clash_protection, false))
	CmdArgs.push_back("-fstack-clash-protection");			CmdArgs.push_back("-fstack-clash-protection");
	}			}

	static void RenderTrivialAutoVarInitOptions(const Driver &D,			static void RenderTrivialAutoVarInitOptions(const Driver &D,
	▲ Show 20 Lines • Show All 4,137 Lines • Show Last 20 Lines

clang/test/CodeGen/stack-clash-protection.c

	// Check the correct function attributes are generated			// Check the correct function attributes are generated
	// RUN: %clang_cc1 -triple x86_64-linux -O0 -S -emit-llvm -o- %s -fstack-clash-protection \| FileCheck %s			// RUN: %clang_cc1 -triple x86_64-linux -O0 -S -emit-llvm -o- %s -fstack-clash-protection \| FileCheck %s
				// RUN: %clang_cc1 -triple s390x-linux-gnu -O0 -S -emit-llvm -o- %s -fstack-clash-protection \| FileCheck %s

	// CHECK: define void @large_stack() #[[A:.*]] {			// CHECK: define void @large_stack() #[[A:.*]] {
	void large_stack() {			void large_stack() {
	volatile int stack[20000], i;			volatile int stack[20000], i;
	for (i = 0; i < sizeof(stack) / sizeof(int); ++i)			for (i = 0; i < sizeof(stack) / sizeof(int); ++i)
	stack[i] = i;			stack[i] = i;
	}			}

	Show All 12 Lines

clang/test/Driver/stack-clash-protection-02.c

This file was added.

				// RUN: %clang -target s390x-linux-gnu -fstack-clash-protection -### %s 2>&1 \| FileCheck %s -check-prefix=SystemZ
				// SystemZ: "-fstack-clash-protection"
				// RUN: %clang -target s390x-linux-gnu -fstack-clash-protection -S -emit-llvm -o %t.ll %s 2>&1 \| FileCheck %s -check-prefix=SystemZ-warn
				// SystemZ-warn: warning: Unable to protect inline asm that clobbers stack pointer against stack clash

				int foo(int c) {
				int r;
				__asm__("ag %%r15, %0"
				:
				: "rm"(c)
				: "r15");
				return r;
				}

llvm/include/llvm/ADT/Triple.h

Show First 20 Lines • Show All 733 Lines • ▼ Show 20 Lines	bool isPPC64() const {
return getArch() == Triple::ppc64 \|\| getArch() == Triple::ppc64le;		return getArch() == Triple::ppc64 \|\| getArch() == Triple::ppc64le;
}		}

/// Tests whether the target is RISC-V (32- and 64-bit).		/// Tests whether the target is RISC-V (32- and 64-bit).
bool isRISCV() const {		bool isRISCV() const {
return getArch() == Triple::riscv32 \|\| getArch() == Triple::riscv64;		return getArch() == Triple::riscv32 \|\| getArch() == Triple::riscv64;
}		}

		/// Tests whether the target is SystemZ.
		bool isSystemZ() const {
		return getArch() == Triple::systemz;
		}

/// Tests whether the target is x86 (32- or 64-bit).		/// Tests whether the target is x86 (32- or 64-bit).
bool isX86() const {		bool isX86() const {
return getArch() == Triple::x86 \|\| getArch() == Triple::x86_64;		return getArch() == Triple::x86 \|\| getArch() == Triple::x86_64;
}		}

/// Tests whether the target is VE		/// Tests whether the target is VE
bool isVE() const {		bool isVE() const {
return getArch() == Triple::ve;		return getArch() == Triple::ve;
▲ Show 20 Lines • Show All 156 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZFrameLowering.h

Show All 37 Lines	public:
restoreCalleeSavedRegisters(MachineBasicBlock &MBB,		restoreCalleeSavedRegisters(MachineBasicBlock &MBB,
MachineBasicBlock::iterator MBBII,		MachineBasicBlock::iterator MBBII,
MutableArrayRef<CalleeSavedInfo> CSI,		MutableArrayRef<CalleeSavedInfo> CSI,
const TargetRegisterInfo *TRI) const override;		const TargetRegisterInfo *TRI) const override;
void processFunctionBeforeFrameFinalized(MachineFunction &MF,		void processFunctionBeforeFrameFinalized(MachineFunction &MF,
RegScavenger *RS) const override;		RegScavenger *RS) const override;
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;		void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override;
		void inlineStackProbe(MachineFunction &MF,
		MachineBasicBlock &PrologMBB) const override;
bool hasFP(const MachineFunction &MF) const override;		bool hasFP(const MachineFunction &MF) const override;
bool hasReservedCallFrame(const MachineFunction &MF) const override;		bool hasReservedCallFrame(const MachineFunction &MF) const override;
int getFrameIndexReference(const MachineFunction &MF, int FI,		int getFrameIndexReference(const MachineFunction &MF, int FI,
Register &FrameReg) const override;		Register &FrameReg) const override;
MachineBasicBlock::iterator		MachineBasicBlock::iterator
eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,		eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
MachineBasicBlock::iterator MI) const override;		MachineBasicBlock::iterator MI) const override;

Show All 13 Lines

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

Show First 20 Lines • Show All 368 Lines • ▼ Show 20 Lines	while (NumBytes) {
MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII->get(Opcode), Reg)		MachineInstr *MI = BuildMI(MBB, MBBI, DL, TII->get(Opcode), Reg)
.addReg(Reg).addImm(ThisVal);		.addReg(Reg).addImm(ThisVal);
// The CC implicit def is dead.		// The CC implicit def is dead.
MI->getOperand(3).setIsDead();		MI->getOperand(3).setIsDead();
NumBytes -= ThisVal;		NumBytes -= ThisVal;
}		}
}		}

		// Add CFI for the new CFA offset.
		static void buildCFAOffs(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		const DebugLoc &DL, int Offset,
		const SystemZInstrInfo *ZII) {
		unsigned CFIIndex = MBB.getParent()->addFrameInst(
		MCCFIInstruction::cfiDefCfaOffset(nullptr, -Offset));
		BuildMI(MBB, MBBI, DL, ZII->get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex);
		}

		// Add CFI for the new frame location.
		static void buildDefCFAReg(MachineBasicBlock &MBB,
		MachineBasicBlock::iterator MBBI,
		const DebugLoc &DL, unsigned Reg,
		const SystemZInstrInfo *ZII) {
		MachineFunction &MF = *MBB.getParent();
		MachineModuleInfo &MMI = MF.getMMI();
		const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
		unsigned RegNum = MRI->getDwarfRegNum(Reg, true);
		unsigned CFIIndex = MF.addFrameInst(
		MCCFIInstruction::createDefCfaRegister(nullptr, RegNum));
		BuildMI(MBB, MBBI, DL, ZII->get(TargetOpcode::CFI_INSTRUCTION))
		.addCFIIndex(CFIIndex);
		}

void SystemZFrameLowering::emitPrologue(MachineFunction &MF,		void SystemZFrameLowering::emitPrologue(MachineFunction &MF,
MachineBasicBlock &MBB) const {		MachineBasicBlock &MBB) const {
assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");		assert(&MF.front() == &MBB && "Shrink-wrapping not yet supported");
		const SystemZSubtarget &STI = MF.getSubtarget<SystemZSubtarget>();
		const SystemZTargetLowering &TLI = *STI.getTargetLowering();
MachineFrameInfo &MFFrame = MF.getFrameInfo();		MachineFrameInfo &MFFrame = MF.getFrameInfo();
auto *ZII =		auto ZII = static_cast<const SystemZInstrInfo >(STI.getInstrInfo());
		uweigandUnsubmitted Done Reply Inline Actions I believe this will now fit onto a single line. uweigand: I believe this will now fit onto a single line.
static_cast<const SystemZInstrInfo *>(MF.getSubtarget().getInstrInfo());
SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();		SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();
MachineBasicBlock::iterator MBBI = MBB.begin();		MachineBasicBlock::iterator MBBI = MBB.begin();
MachineModuleInfo &MMI = MF.getMMI();		MachineModuleInfo &MMI = MF.getMMI();
const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();		const MCRegisterInfo *MRI = MMI.getContext().getRegisterInfo();
const std::vector<CalleeSavedInfo> &CSI = MFFrame.getCalleeSavedInfo();		const std::vector<CalleeSavedInfo> &CSI = MFFrame.getCalleeSavedInfo();
bool HasFP = hasFP(MF);		bool HasFP = hasFP(MF);

// In GHC calling convention C stack space, including the ABI-defined		// In GHC calling convention C stack space, including the ABI-defined
▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	if (StackSize) {
// If we need backchain, save current stack pointer. R1 is free at this		// If we need backchain, save current stack pointer. R1 is free at this
// point.		// point.
if (StoreBackchain)		if (StoreBackchain)
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR))		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR))
.addReg(SystemZ::R1D, RegState::Define).addReg(SystemZ::R15D);		.addReg(SystemZ::R1D, RegState::Define).addReg(SystemZ::R15D);

// Allocate StackSize bytes.		// Allocate StackSize bytes.
int64_t Delta = -int64_t(StackSize);		int64_t Delta = -int64_t(StackSize);
		const unsigned ProbeSize = TLI.getStackProbeSize(MF);
		bool FreeProbe = (ZFI->getSpillGPRRegs().GPROffset &&
		(ZFI->getSpillGPRRegs().GPROffset + StackSize) < ProbeSize);
		if (!FreeProbe &&
		MF.getSubtarget().getTargetLowering()->hasInlineStackProbe(MF)) {
		// Stack probing may involve looping, but splitting the prologue block
		// is not possible at this point since it would invalidate the
		// SaveBlocks / RestoreBlocks sets of PEI in the single block function
		// case. Build a pseudo to be handled later by inlineStackProbe().
		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::PROBED_STACKALLOC))
		.addImm(StackSize);
		}
		else {
emitIncrement(MBB, MBBI, DL, SystemZ::R15D, Delta, ZII);		emitIncrement(MBB, MBBI, DL, SystemZ::R15D, Delta, ZII);
		buildCFAOffs(MBB, MBBI, DL, SPOffsetFromCFA + Delta, ZII);
// Add CFI for the allocation.		}
unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::cfiDefCfaOffset(nullptr, -SPOffsetFromCFA - Delta));
BuildMI(MBB, MBBI, DL, ZII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);
SPOffsetFromCFA += Delta;		SPOffsetFromCFA += Delta;

if (StoreBackchain) {		if (StoreBackchain) {
// The back chain is stored topmost with packed-stack.		// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;		int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::STG))
.addReg(SystemZ::R1D, RegState::Kill).addReg(SystemZ::R15D)		.addReg(SystemZ::R1D, RegState::Kill).addReg(SystemZ::R15D)
.addImm(Offset).addReg(0);		.addImm(Offset).addReg(0);
}		}
}		}

if (HasFP) {		if (HasFP) {
// Copy the base of the frame to R11.		// Copy the base of the frame to R11.
BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R11D)		BuildMI(MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R11D)
.addReg(SystemZ::R15D);		.addReg(SystemZ::R15D);

// Add CFI for the new frame location.		// Add CFI for the new frame location.
unsigned HardFP = MRI->getDwarfRegNum(SystemZ::R11D, true);		buildDefCFAReg(MBB, MBBI, DL, SystemZ::R11D, ZII);
unsigned CFIIndex = MF.addFrameInst(
MCCFIInstruction::createDefCfaRegister(nullptr, HardFP));
BuildMI(MBB, MBBI, DL, ZII->get(TargetOpcode::CFI_INSTRUCTION))
.addCFIIndex(CFIIndex);

// Mark the FramePtr as live at the beginning of every block except		// Mark the FramePtr as live at the beginning of every block except
// the entry block. (We'll have marked R11 as live on entry when		// the entry block. (We'll have marked R11 as live on entry when
// saving the GPRs.)		// saving the GPRs.)
for (auto I = std::next(MF.begin()), E = MF.end(); I != E; ++I)		for (auto I = std::next(MF.begin()), E = MF.end(); I != E; ++I)
I->addLiveIn(SystemZ::R11D);		I->addLiveIn(SystemZ::R11D);
}		}

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	if (ZFI->getRestoreGPRRegs().LowGPR) {
MBBI->setDesc(ZII->get(NewOpcode));		MBBI->setDesc(ZII->get(NewOpcode));
MBBI->getOperand(AddrOpNo + 1).ChangeToImmediate(Offset);		MBBI->getOperand(AddrOpNo + 1).ChangeToImmediate(Offset);
} else if (StackSize) {		} else if (StackSize) {
DebugLoc DL = MBBI->getDebugLoc();		DebugLoc DL = MBBI->getDebugLoc();
emitIncrement(MBB, MBBI, DL, SystemZ::R15D, StackSize, ZII);		emitIncrement(MBB, MBBI, DL, SystemZ::R15D, StackSize, ZII);
}		}
}		}

		void SystemZFrameLowering::inlineStackProbe(MachineFunction &MF,
		MachineBasicBlock &PrologMBB) const {
		auto *ZII =
		static_cast<const SystemZInstrInfo *>(MF.getSubtarget().getInstrInfo());
		const SystemZSubtarget &STI = MF.getSubtarget<SystemZSubtarget>();
		const SystemZTargetLowering &TLI = *STI.getTargetLowering();

		MachineInstr *StackAllocMI = nullptr;
		serge-sans-pailleUnsubmitted Not Done Reply Inline Actions could be: auto Where = llvm::find_if(PrologMBB, [](MachineInstr& MI) { return MI.getOpcode() == SystemZ::STACKALLOC_W_PROBING;}); if(Where = PrologMBB.end()) return; MachineInstr &StackAllocMI = Where; serge-sans-paille:* could be: ``` auto Where = llvm::find_if(PrologMBB, [](MachineInstr& MI) { return MI.getOpcode…
		for (MachineInstr &MI : PrologMBB)
		if (MI.getOpcode() == SystemZ::PROBED_STACKALLOC) {
		StackAllocMI = &MI;
		break;
		}
		if (StackAllocMI == nullptr)
		return;
		uint64_t StackSize = StackAllocMI->getOperand(0).getImm();
		const unsigned ProbeSize = TLI.getStackProbeSize(MF);
		uint64_t NumFullBlocks = StackSize / ProbeSize;
		uint64_t Residual = StackSize % ProbeSize;
		int64_t SPOffsetFromCFA = -SystemZMC::CFAOffsetFromInitialSP;
		MachineBasicBlock *MBB = &PrologMBB;
		MachineBasicBlock::iterator MBBI = StackAllocMI;
		const DebugLoc DL = StackAllocMI->getDebugLoc();

		// Allocate a block of Size bytes on the stack and probe it.
		auto allocateAndProbe = [&](MachineBasicBlock &InsMBB,
		MachineBasicBlock::iterator InsPt, unsigned Size,
		bool EmitCFI) -> void {
		emitIncrement(InsMBB, InsPt, DL, SystemZ::R15D, -int64_t(Size), ZII);
		if (EmitCFI) {
		SPOffsetFromCFA -= Size;
		buildCFAOffs(InsMBB, InsPt, DL, SPOffsetFromCFA, ZII);
		}
		// Probe by means of a volatile compare.
		MachineMemOperand *MMO = MF.getMachineMemOperand(MachinePointerInfo(),
		MachineMemOperand::MOVolatile \| MachineMemOperand::MOLoad, 8, Align(1));
		BuildMI(InsMBB, InsPt, DL, ZII->get(SystemZ::CG))
		.addReg(SystemZ::R0D, RegState::Undef)
		.addReg(SystemZ::R15D).addImm(Size - 8).addReg(0)
		.addMemOperand(MMO);
		};

		if (NumFullBlocks < 3) {
		// Emit unrolled probe statements.
		for (unsigned int i = 0; i < NumFullBlocks; i++)
		allocateAndProbe(MBB, MBBI, ProbeSize, true/EmitCFI*/);
		} else {
		// Emit a loop probing the pages.
		uint64_t LoopAlloc = ProbeSize * NumFullBlocks;
		SPOffsetFromCFA -= LoopAlloc;

		BuildMI(*MBB, MBBI, DL, ZII->get(SystemZ::LGR), SystemZ::R1D)
		.addReg(SystemZ::R15D);
		buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R1D, ZII);
		serge-sans-pailleUnsubmitted Done Reply Inline Actions Maybe we should syndicate that unroll parameter somewhere across architectures? serge-sans-paille: Maybe we should syndicate that unroll parameter somewhere across architectures?
		jonpaAuthorUnsubmitted Done Reply Inline Actions I am not sure I see the benefit of that since this entire method is already defined by the target. jonpa: I am not sure I see the benefit of that since this entire method is already defined by the…
		emitIncrement(*MBB, MBBI, DL, SystemZ::R1D, -int64_t(LoopAlloc), ZII);
		buildCFAOffs(*MBB, MBBI, DL, -int64_t(SystemZMC::CallFrameSize + LoopAlloc),
		ZII);

		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MBBI, MBB);
		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(MBB);
		MBB->addSuccessor(LoopMBB);
		LoopMBB->addSuccessor(LoopMBB);
		LoopMBB->addSuccessor(DoneMBB);

		MBB = LoopMBB;
		allocateAndProbe(MBB, MBB->end(), ProbeSize, false/EmitCFI*/);
		BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::CLGR))
		.addReg(SystemZ::R15D).addReg(SystemZ::R1D);
		BuildMI(*MBB, MBB->end(), DL, ZII->get(SystemZ::BRC))
		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_GT).addMBB(MBB);

		MBB = DoneMBB;
		MBBI = DoneMBB->begin();
		buildDefCFAReg(*MBB, MBBI, DL, SystemZ::R15D, ZII);

		recomputeLiveIns(*DoneMBB);
		recomputeLiveIns(*LoopMBB);
		}

		if (Residual)
		allocateAndProbe(MBB, MBBI, Residual, true/EmitCFI*/);

		StackAllocMI->eraseFromParent();
		}

bool SystemZFrameLowering::hasFP(const MachineFunction &MF) const {		bool SystemZFrameLowering::hasFP(const MachineFunction &MF) const {
return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|		return (MF.getTarget().Options.DisableFramePointerElim(MF) \|\|
MF.getFrameInfo().hasVarSizedObjects() \|\|		MF.getFrameInfo().hasVarSizedObjects() \|\|
MF.getInfo<SystemZMachineFunctionInfo>()->getManipulatesSP());		MF.getInfo<SystemZMachineFunctionInfo>()->getManipulatesSP());
}		}

bool		bool
		serge-sans-pailleUnsubmitted Done Reply Inline Actions On X86, we don't need to probe that final allocation because the two ways the stack could grow after the final alloca are A function call, and in that case we get a free probe when we make a function call A PROBED_ALLOCA, and in that case we get a probe at Residual + PAGE_SIZE, which is right into the Page Guard. I assume that's different for SystemZ? serge-sans-paille: On X86, we don't need to probe that final allocation because the two ways the stack could grow…
		jonpaAuthorUnsubmitted Done Reply Inline Actions There's not always a free probe with a function call on SystemZ (only if stack space is allocated by the called function), but regardless of that I think the residual is needed. Since the probe is done on the topmost byte of each allocated block on SystemZ, the guard page could fit within the last allocated full block and a residual of just 8 or more bytes (probing the high byte is what GCC is doing on SystemZ). I still wonder if that would not be necessary also on X86 in case "2": Given that you on X86 probe the lower byte of each full block, you could get a residual into the guard page, and then if the dynamic alloca started with a full block, the next probe would not be into the guard page, but past it: \| \| \|GGGGG\| \| \| P\| R\| P\| Maybe I am missing something? jonpa: There's not always a free probe with a function call on SystemZ (only if stack space is…
		serge-sans-pailleUnsubmitted Not Done Reply Inline Actions I took my pen and paper, and you're definitively right. Fortunately, there's the possibility of always probing the upper byte when doing a dynamic alloca, so that we always avoid this extra probe. That way the common case (alloc_size < PAGE_SIZE) remains costless. Correct? serge-sans-paille: I took my pen and paper, and you're definitively right. Fortunately, there's the possibility of…
		jonpaAuthorUnsubmitted Done Reply Inline Actions IIUC, the topmost byte of each allocated stack range is always naturally probed before the entry of the prologue by the push of the return address by the call. The lowermost byte is then probed in each full block, but the residual needs no probing. In order get a matching behavior with dynamic allocas, the topmost byte should always be probed, as well as the lowermost byte of each full block. So yes, I think that would be correct as long as there are no other allocations of stack space anywhere... jonpa: IIUC, the topmost byte of each allocated stack range is always naturally probed before the…
SystemZFrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {		SystemZFrameLowering::hasReservedCallFrame(const MachineFunction &MF) const {
// The ABI requires us to allocate 160 bytes of stack space for the callee,		// The ABI requires us to allocate 160 bytes of stack space for the callee,
// with any outgoing stack arguments being placed above that. It seems		// with any outgoing stack arguments being placed above that. It seems
// better to make that area a permanent feature of the frame even if		// better to make that area a permanent feature of the frame even if
// we're using a frame pointer.		// we're using a frame pointer.
return true;		return true;
}		}

▲ Show 20 Lines • Show All 67 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.h

Show First 20 Lines • Show All 77 Lines • ▼ Show 20 Lines	enum NodeType : unsigned {
// chosen over operand 1; it has the same form as BR_CCMASK.		// chosen over operand 1; it has the same form as BR_CCMASK.
// Operand 3 is the flag operand.		// Operand 3 is the flag operand.
SELECT_CCMASK,		SELECT_CCMASK,

// Evaluates to the gap between the stack pointer and the		// Evaluates to the gap between the stack pointer and the
// base of the dynamically-allocatable area.		// base of the dynamically-allocatable area.
ADJDYNALLOC,		ADJDYNALLOC,

		// For allocating stack space when using stack clash protector.
		// Allocation is performed by block, and each block is probed.
		PROBED_ALLOCA,

// Count number of bits set in operand 0 per byte.		// Count number of bits set in operand 0 per byte.
POPCNT,		POPCNT,

// Wrappers around the ISD opcodes of the same name. The output is GR128.		// Wrappers around the ISD opcodes of the same name. The output is GR128.
// Input operands may be GR64 or GR32, depending on the instruction.		// Input operands may be GR64 or GR32, depending on the instruction.
SMUL_LOHI,		SMUL_LOHI,
UMUL_LOHI,		UMUL_LOHI,
SDIVREM,		SDIVREM,
▲ Show 20 Lines • Show All 329 Lines • ▼ Show 20 Lines	public:
}		}
bool isCheapToSpeculateCtlz() const override { return true; }		bool isCheapToSpeculateCtlz() const override { return true; }
EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,		EVT getSetCCResultType(const DataLayout &DL, LLVMContext &,
EVT) const override;		EVT) const override;
bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,		bool isFMAFasterThanFMulAndFAdd(const MachineFunction &MF,
EVT VT) const override;		EVT VT) const override;
bool isFPImmLegal(const APFloat &Imm, EVT VT,		bool isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const override;		bool ForCodeSize) const override;
		bool hasInlineStackProbe(MachineFunction &MF) const override;
bool isLegalICmpImmediate(int64_t Imm) const override;		bool isLegalICmpImmediate(int64_t Imm) const override;
bool isLegalAddImmediate(int64_t Imm) const override;		bool isLegalAddImmediate(int64_t Imm) const override;
bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,		bool isLegalAddressingMode(const DataLayout &DL, const AddrMode &AM, Type *Ty,
unsigned AS,		unsigned AS,
Instruction *I = nullptr) const override;		Instruction *I = nullptr) const override;
bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,		bool allowsMisalignedMemoryAccesses(EVT VT, unsigned AS,
unsigned Align,		unsigned Align,
MachineMemOperand::Flags Flags,		MachineMemOperand::Flags Flags,
▲ Show 20 Lines • Show All 112 Lines • ▼ Show 20 Lines	public:
ISD::NodeType getExtendForAtomicOps() const override {		ISD::NodeType getExtendForAtomicOps() const override {
return ISD::ANY_EXTEND;		return ISD::ANY_EXTEND;
}		}

bool supportSwiftError() const override {		bool supportSwiftError() const override {
return true;		return true;
}		}

		unsigned getStackProbeSize(MachineFunction &MF) const;

private:		private:
const SystemZSubtarget &Subtarget;		const SystemZSubtarget &Subtarget;

// Implement LowerOperation for individual opcodes.		// Implement LowerOperation for individual opcodes.
SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,		SDValue getVectorCmp(SelectionDAG &DAG, unsigned Opcode,
const SDLoc &DL, EVT VT,		const SDLoc &DL, EVT VT,
SDValue CmpOp0, SDValue CmpOp1, SDValue Chain) const;		SDValue CmpOp0, SDValue CmpOp1, SDValue Chain) const;
SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,		SDValue lowerVectorSETCC(SelectionDAG &DAG, const SDLoc &DL,
▲ Show 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	private:
MachineBasicBlock emitStringWrapper(MachineInstr &MI, MachineBasicBlock BB,		MachineBasicBlock emitStringWrapper(MachineInstr &MI, MachineBasicBlock BB,
unsigned Opcode) const;		unsigned Opcode) const;
MachineBasicBlock *emitTransactionBegin(MachineInstr &MI,		MachineBasicBlock *emitTransactionBegin(MachineInstr &MI,
MachineBasicBlock *MBB,		MachineBasicBlock *MBB,
unsigned Opcode, bool NoFloat) const;		unsigned Opcode, bool NoFloat) const;
MachineBasicBlock *emitLoadAndTestCmp0(MachineInstr &MI,		MachineBasicBlock *emitLoadAndTestCmp0(MachineInstr &MI,
MachineBasicBlock *MBB,		MachineBasicBlock *MBB,
unsigned Opcode) const;		unsigned Opcode) const;
		MachineBasicBlock *emitProbedAlloca(MachineInstr &MI,
		MachineBasicBlock *MBB) const;

MachineMemOperand::Flags		MachineMemOperand::Flags
getTargetMMOFlags(const Instruction &I) const override;		getTargetMMOFlags(const Instruction &I) const override;
const TargetRegisterClass *getRepRegClassFor(MVT VT) const override;		const TargetRegisterClass *getRepRegClassFor(MVT VT) const override;
};		};

struct SystemZVectorConstantInfo {		struct SystemZVectorConstantInfo {
private:		private:
Show All 18 Lines

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 820 Lines • ▼ Show 20 Lines	bool SystemZTargetLowering::isFPImmLegal(const APFloat &Imm, EVT VT,
bool ForCodeSize) const {		bool ForCodeSize) const {
// We can load zero using LZ?R and negative zero using LZ?R;LC?BR.		// We can load zero using LZ?R and negative zero using LZ?R;LC?BR.
if (Imm.isZero() \|\| Imm.isNegZero())		if (Imm.isZero() \|\| Imm.isNegZero())
return true;		return true;

return SystemZVectorConstantInfo(Imm).isVectorConstantLegal(Subtarget);		return SystemZVectorConstantInfo(Imm).isVectorConstantLegal(Subtarget);
}		}

		/// Returns true if stack probing through inline assembly is requested.
		bool SystemZTargetLowering::hasInlineStackProbe(MachineFunction &MF) const {
		// If the function specifically requests inline stack probes, emit them.
		if (MF.getFunction().hasFnAttribute("probe-stack"))
		return MF.getFunction().getFnAttribute("probe-stack").getValueAsString() ==
		"inline-asm";
		return false;
		}

bool SystemZTargetLowering::isLegalICmpImmediate(int64_t Imm) const {		bool SystemZTargetLowering::isLegalICmpImmediate(int64_t Imm) const {
// We can use CGFI or CLGFI.		// We can use CGFI or CLGFI.
return isInt<32>(Imm) \|\| isUInt<32>(Imm);		return isInt<32>(Imm) \|\| isUInt<32>(Imm);
}		}

bool SystemZTargetLowering::isLegalAddImmediate(int64_t Imm) const {		bool SystemZTargetLowering::isLegalAddImmediate(int64_t Imm) const {
// We can use ALGFI or SLGFI.		// We can use ALGFI or SLGFI.
return isUInt<32>(Imm) \|\| isUInt<32>(-Imm);		return isUInt<32>(Imm) \|\| isUInt<32>(-Imm);
▲ Show 20 Lines • Show All 2,586 Lines • ▼ Show 20 Lines	if (StoreBackchain)
Backchain = DAG.getLoad(MVT::i64, DL, Chain, OldSP, MachinePointerInfo());		Backchain = DAG.getLoad(MVT::i64, DL, Chain, OldSP, MachinePointerInfo());

// Add extra space for alignment if needed.		// Add extra space for alignment if needed.
if (ExtraAlignSpace)		if (ExtraAlignSpace)
NeededSpace = DAG.getNode(ISD::ADD, DL, MVT::i64, NeededSpace,		NeededSpace = DAG.getNode(ISD::ADD, DL, MVT::i64, NeededSpace,
DAG.getConstant(ExtraAlignSpace, DL, MVT::i64));		DAG.getConstant(ExtraAlignSpace, DL, MVT::i64));

// Get the new stack pointer value.		// Get the new stack pointer value.
SDValue NewSP = DAG.getNode(ISD::SUB, DL, MVT::i64, OldSP, NeededSpace);		SDValue NewSP;
		if (hasInlineStackProbe(MF)) {
		NewSP = DAG.getNode(SystemZISD::PROBED_ALLOCA, DL,
		DAG.getVTList(MVT::i64, MVT::Other), Chain, OldSP, NeededSpace);
		Chain = NewSP.getValue(1);
		}
		else {
		NewSP = DAG.getNode(ISD::SUB, DL, MVT::i64, OldSP, NeededSpace);
// Copy the new stack pointer back.		// Copy the new stack pointer back.
Chain = DAG.getCopyToReg(Chain, DL, SPReg, NewSP);		Chain = DAG.getCopyToReg(Chain, DL, SPReg, NewSP);
		}

// The allocated data lives above the 160 bytes allocated for the standard		// The allocated data lives above the 160 bytes allocated for the standard
// frame, plus any outgoing stack arguments. We don't know how much that		// frame, plus any outgoing stack arguments. We don't know how much that
// amounts to yet, so emit a special ADJDYNALLOC placeholder.		// amounts to yet, so emit a special ADJDYNALLOC placeholder.
SDValue ArgAdjust = DAG.getNode(SystemZISD::ADJDYNALLOC, DL, MVT::i64);		SDValue ArgAdjust = DAG.getNode(SystemZISD::ADJDYNALLOC, DL, MVT::i64);
SDValue Result = DAG.getNode(ISD::ADD, DL, MVT::i64, NewSP, ArgAdjust);		SDValue Result = DAG.getNode(ISD::ADD, DL, MVT::i64, NewSP, ArgAdjust);

// Dynamically realign if needed.		// Dynamically realign if needed.
▲ Show 20 Lines • Show All 1,952 Lines • ▼ Show 20 Lines	switch ((SystemZISD::NodeType)Opcode) {
OPCODE(ICMP);		OPCODE(ICMP);
OPCODE(FCMP);		OPCODE(FCMP);
OPCODE(STRICT_FCMP);		OPCODE(STRICT_FCMP);
OPCODE(STRICT_FCMPS);		OPCODE(STRICT_FCMPS);
OPCODE(TM);		OPCODE(TM);
OPCODE(BR_CCMASK);		OPCODE(BR_CCMASK);
OPCODE(SELECT_CCMASK);		OPCODE(SELECT_CCMASK);
OPCODE(ADJDYNALLOC);		OPCODE(ADJDYNALLOC);
		OPCODE(PROBED_ALLOCA);
OPCODE(POPCNT);		OPCODE(POPCNT);
OPCODE(SMUL_LOHI);		OPCODE(SMUL_LOHI);
OPCODE(UMUL_LOHI);		OPCODE(UMUL_LOHI);
OPCODE(SDIVREM);		OPCODE(SDIVREM);
OPCODE(UDIVREM);		OPCODE(UDIVREM);
OPCODE(SADDO);		OPCODE(SADDO);
OPCODE(SSUBO);		OPCODE(SSUBO);
OPCODE(UADDO);		OPCODE(UADDO);
▲ Show 20 Lines • Show All 1,409 Lines • ▼ Show 20 Lines	if (Opcode == ISD::INTRINSIC_WO_CHAIN) {
default:		default:
break;		break;
}		}
}		}

return 1;		return 1;
}		}

		unsigned
		SystemZTargetLowering::getStackProbeSize(MachineFunction &MF) const {
		const TargetFrameLowering *TFI = Subtarget.getFrameLowering();
		unsigned StackAlign = TFI->getStackAlignment();
		assert(StackAlign >=1 && isPowerOf2_32(StackAlign) &&
		"Unexpected stack alignment");
		// The default stack probe size is 4096 if the function has no
		// stack-probe-size attribute.
		unsigned StackProbeSize = 4096;
		const Function &Fn = MF.getFunction();
		if (Fn.hasFnAttribute("stack-probe-size"))
		Fn.getFnAttribute("stack-probe-size")
		.getValueAsString()
		.getAsInteger(0, StackProbeSize);
		// Round down to the stack alignment.
		StackProbeSize &= ~(StackAlign - 1);
		uweigandUnsubmitted Done Reply Inline Actions The RHS can be simplified to ~(StackAlign - 1) uweigand: The RHS can be simplified to ~(StackAlign - 1)
		return StackProbeSize ? StackProbeSize : StackAlign;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Custom insertion		// Custom insertion
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

// Create a new basic block after MBB.
static MachineBasicBlock emitBlockAfter(MachineBasicBlock MBB) {
MachineFunction &MF = *MBB->getParent();
MachineBasicBlock *NewMBB = MF.CreateMachineBasicBlock(MBB->getBasicBlock());
MF.insert(std::next(MachineFunction::iterator(MBB)), NewMBB);
return NewMBB;
}

// Split MBB after MI and return the new block (the one that contains
// instructions after MI).
static MachineBasicBlock *splitBlockAfter(MachineBasicBlock::iterator MI,
MachineBasicBlock *MBB) {
MachineBasicBlock *NewMBB = emitBlockAfter(MBB);
NewMBB->splice(NewMBB->begin(), MBB,
std::next(MachineBasicBlock::iterator(MI)), MBB->end());
NewMBB->transferSuccessorsAndUpdatePHIs(MBB);
return NewMBB;
}

// Split MBB before MI and return the new block (the one that contains MI).
static MachineBasicBlock *splitBlockBefore(MachineBasicBlock::iterator MI,
MachineBasicBlock *MBB) {
MachineBasicBlock *NewMBB = emitBlockAfter(MBB);
NewMBB->splice(NewMBB->begin(), MBB, MI, MBB->end());
NewMBB->transferSuccessorsAndUpdatePHIs(MBB);
return NewMBB;
}

// Force base value Base into a register before MI. Return the register.		// Force base value Base into a register before MI. Return the register.
static Register forceReg(MachineInstr &MI, MachineOperand &Base,		static Register forceReg(MachineInstr &MI, MachineOperand &Base,
const SystemZInstrInfo *TII) {		const SystemZInstrInfo *TII) {
if (Base.isReg())		if (Base.isReg())
return Base.getReg();		return Base.getReg();

MachineBasicBlock *MBB = MI.getParent();		MachineBasicBlock *MBB = MI.getParent();
MachineFunction &MF = *MBB->getParent();		MachineFunction &MF = *MBB->getParent();
▲ Show 20 Lines • Show All 154 Lines • ▼ Show 20 Lines	for (MachineBasicBlock::iterator NextMIIt =
else if (User \|\| ++Count > 20)		else if (User \|\| ++Count > 20)
break;		break;
}		}

MachineInstr *LastMI = Selects.back();		MachineInstr *LastMI = Selects.back();
bool CCKilled =		bool CCKilled =
(LastMI->killsRegister(SystemZ::CC) \|\| checkCCKill(*LastMI, MBB));		(LastMI->killsRegister(SystemZ::CC) \|\| checkCCKill(*LastMI, MBB));
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *JoinMBB = splitBlockAfter(LastMI, MBB);		MachineBasicBlock *JoinMBB = SystemZ::splitBlockAfter(LastMI, MBB);
MachineBasicBlock *FalseMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *FalseMBB = SystemZ::emitBlockAfter(StartMBB);

// Unless CC was killed in the last Select instruction, mark it as		// Unless CC was killed in the last Select instruction, mark it as
// live-in to both FalseMBB and JoinMBB.		// live-in to both FalseMBB and JoinMBB.
if (!CCKilled) {		if (!CCKilled) {
FalseMBB->addLiveIn(SystemZ::CC);		FalseMBB->addLiveIn(SystemZ::CC);
JoinMBB->addLiveIn(SystemZ::CC);		JoinMBB->addLiveIn(SystemZ::CC);
}		}

▲ Show 20 Lines • Show All 76 Lines • ▼ Show 20 Lines	if (STOCOpcode && !IndexReg && Subtarget.hasLoadStoreOnCond()) {
return MBB;		return MBB;
}		}

// Get the condition needed to branch around the store.		// Get the condition needed to branch around the store.
if (!Invert)		if (!Invert)
CCMask ^= CCValid;		CCMask ^= CCValid;

MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *JoinMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *JoinMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *FalseMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *FalseMBB = SystemZ::emitBlockAfter(StartMBB);

// Unless CC was killed in the CondStore instruction, mark it as		// Unless CC was killed in the CondStore instruction, mark it as
// live-in to both FalseMBB and JoinMBB.		// live-in to both FalseMBB and JoinMBB.
if (!MI.killsRegister(SystemZ::CC) && !checkCCKill(MI, JoinMBB)) {		if (!MI.killsRegister(SystemZ::CC) && !checkCCKill(MI, JoinMBB)) {
FalseMBB->addLiveIn(SystemZ::CC);		FalseMBB->addLiveIn(SystemZ::CC);
JoinMBB->addLiveIn(SystemZ::CC);		JoinMBB->addLiveIn(SystemZ::CC);
}		}

▲ Show 20 Lines • Show All 66 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::emitAtomicLoadBinary(
Register OldVal = MRI.createVirtualRegister(RC);		Register OldVal = MRI.createVirtualRegister(RC);
Register NewVal = (BinOpcode \|\| IsSubWord ?		Register NewVal = (BinOpcode \|\| IsSubWord ?
MRI.createVirtualRegister(RC) : Src2.getReg());		MRI.createVirtualRegister(RC) : Src2.getReg());
Register RotatedOldVal = (IsSubWord ? MRI.createVirtualRegister(RC) : OldVal);		Register RotatedOldVal = (IsSubWord ? MRI.createVirtualRegister(RC) : OldVal);
Register RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);		Register RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);

// Insert a basic block for the main loop.		// Insert a basic block for the main loop.
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);

// StartMBB:		// StartMBB:
// ...		// ...
// %OrigVal = L Disp(%Base)		// %OrigVal = L Disp(%Base)
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB = StartMBB;		MBB = StartMBB;
BuildMI(MBB, DL, TII->get(LOpcode), OrigVal).add(Base).addImm(Disp).addReg(0);		BuildMI(MBB, DL, TII->get(LOpcode), OrigVal).add(Base).addImm(Disp).addReg(0);
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);
▲ Show 20 Lines • Show All 100 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::emitAtomicLoadMinMax(
Register OldVal = MRI.createVirtualRegister(RC);		Register OldVal = MRI.createVirtualRegister(RC);
Register NewVal = MRI.createVirtualRegister(RC);		Register NewVal = MRI.createVirtualRegister(RC);
Register RotatedOldVal = (IsSubWord ? MRI.createVirtualRegister(RC) : OldVal);		Register RotatedOldVal = (IsSubWord ? MRI.createVirtualRegister(RC) : OldVal);
Register RotatedAltVal = (IsSubWord ? MRI.createVirtualRegister(RC) : Src2);		Register RotatedAltVal = (IsSubWord ? MRI.createVirtualRegister(RC) : Src2);
Register RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);		Register RotatedNewVal = (IsSubWord ? MRI.createVirtualRegister(RC) : NewVal);

// Insert 3 basic blocks for the loop.		// Insert 3 basic blocks for the loop.
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);
MachineBasicBlock *UseAltMBB = emitBlockAfter(LoopMBB);		MachineBasicBlock *UseAltMBB = SystemZ::emitBlockAfter(LoopMBB);
MachineBasicBlock *UpdateMBB = emitBlockAfter(UseAltMBB);		MachineBasicBlock *UpdateMBB = SystemZ::emitBlockAfter(UseAltMBB);

// StartMBB:		// StartMBB:
// ...		// ...
// %OrigVal = L Disp(%Base)		// %OrigVal = L Disp(%Base)
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB = StartMBB;		MBB = StartMBB;
BuildMI(MBB, DL, TII->get(LOpcode), OrigVal).add(Base).addImm(Disp).addReg(0);		BuildMI(MBB, DL, TII->get(LOpcode), OrigVal).add(Base).addImm(Disp).addReg(0);
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);
▲ Show 20 Lines • Show All 91 Lines • ▼ Show 20 Lines	SystemZTargetLowering::emitAtomicCmpSwapW(MachineInstr &MI,
Register SwapVal = MRI.createVirtualRegister(RC);		Register SwapVal = MRI.createVirtualRegister(RC);
Register StoreVal = MRI.createVirtualRegister(RC);		Register StoreVal = MRI.createVirtualRegister(RC);
Register RetryOldVal = MRI.createVirtualRegister(RC);		Register RetryOldVal = MRI.createVirtualRegister(RC);
Register RetryCmpVal = MRI.createVirtualRegister(RC);		Register RetryCmpVal = MRI.createVirtualRegister(RC);
Register RetrySwapVal = MRI.createVirtualRegister(RC);		Register RetrySwapVal = MRI.createVirtualRegister(RC);

// Insert 2 basic blocks for the loop.		// Insert 2 basic blocks for the loop.
MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);
MachineBasicBlock *SetMBB = emitBlockAfter(LoopMBB);		MachineBasicBlock *SetMBB = SystemZ::emitBlockAfter(LoopMBB);

// StartMBB:		// StartMBB:
// ...		// ...
// %OrigOldVal = L Disp(%Base)		// %OrigOldVal = L Disp(%Base)
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB = StartMBB;		MBB = StartMBB;
BuildMI(MBB, DL, TII->get(LOpcode), OrigOldVal)		BuildMI(MBB, DL, TII->get(LOpcode), OrigOldVal)
.add(Base)		.add(Base)
▲ Show 20 Lines • Show All 143 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::emitMemMemWrapper(
uint64_t DestDisp = MI.getOperand(1).getImm();		uint64_t DestDisp = MI.getOperand(1).getImm();
MachineOperand SrcBase = earlyUseOperand(MI.getOperand(2));		MachineOperand SrcBase = earlyUseOperand(MI.getOperand(2));
uint64_t SrcDisp = MI.getOperand(3).getImm();		uint64_t SrcDisp = MI.getOperand(3).getImm();
uint64_t Length = MI.getOperand(4).getImm();		uint64_t Length = MI.getOperand(4).getImm();

// When generating more than one CLC, all but the last will need to		// When generating more than one CLC, all but the last will need to
// branch to the end when a difference is found.		// branch to the end when a difference is found.
MachineBasicBlock *EndMBB = (Length > 256 && Opcode == SystemZ::CLC ?		MachineBasicBlock *EndMBB = (Length > 256 && Opcode == SystemZ::CLC ?
splitBlockAfter(MI, MBB) : nullptr);		SystemZ::splitBlockAfter(MI, MBB) : nullptr);

// Check for the loop form, in which operand 5 is the trip count.		// Check for the loop form, in which operand 5 is the trip count.
if (MI.getNumExplicitOperands() > 5) {		if (MI.getNumExplicitOperands() > 5) {
bool HaveSingleBase = DestBase.isIdenticalTo(SrcBase);		bool HaveSingleBase = DestBase.isIdenticalTo(SrcBase);

Register StartCountReg = MI.getOperand(5).getReg();		Register StartCountReg = MI.getOperand(5).getReg();
Register StartSrcReg = forceReg(MI, SrcBase, TII);		Register StartSrcReg = forceReg(MI, SrcBase, TII);
Register StartDestReg = (HaveSingleBase ? StartSrcReg :		Register StartDestReg = (HaveSingleBase ? StartSrcReg :
forceReg(MI, DestBase, TII));		forceReg(MI, DestBase, TII));

const TargetRegisterClass *RC = &SystemZ::ADDR64BitRegClass;		const TargetRegisterClass *RC = &SystemZ::ADDR64BitRegClass;
Register ThisSrcReg = MRI.createVirtualRegister(RC);		Register ThisSrcReg = MRI.createVirtualRegister(RC);
Register ThisDestReg = (HaveSingleBase ? ThisSrcReg :		Register ThisDestReg = (HaveSingleBase ? ThisSrcReg :
MRI.createVirtualRegister(RC));		MRI.createVirtualRegister(RC));
Register NextSrcReg = MRI.createVirtualRegister(RC);		Register NextSrcReg = MRI.createVirtualRegister(RC);
Register NextDestReg = (HaveSingleBase ? NextSrcReg :		Register NextDestReg = (HaveSingleBase ? NextSrcReg :
MRI.createVirtualRegister(RC));		MRI.createVirtualRegister(RC));

RC = &SystemZ::GR64BitRegClass;		RC = &SystemZ::GR64BitRegClass;
Register ThisCountReg = MRI.createVirtualRegister(RC);		Register ThisCountReg = MRI.createVirtualRegister(RC);
Register NextCountReg = MRI.createVirtualRegister(RC);		Register NextCountReg = MRI.createVirtualRegister(RC);

MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);
MachineBasicBlock *NextMBB = (EndMBB ? emitBlockAfter(LoopMBB) : LoopMBB);		MachineBasicBlock *NextMBB =
		(EndMBB ? SystemZ::emitBlockAfter(LoopMBB) : LoopMBB);

// StartMBB:		// StartMBB:
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);

// LoopMBB:		// LoopMBB:
// %ThisDestReg = phi [ %StartDestReg, StartMBB ],		// %ThisDestReg = phi [ %StartDestReg, StartMBB ],
// [ %NextDestReg, NextMBB ]		// [ %NextDestReg, NextMBB ]
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	BuildMI(*MBB, MI, DL, TII->get(Opcode))
.addImm(SrcDisp)		.addImm(SrcDisp)
.setMemRefs(MI.memoperands());		.setMemRefs(MI.memoperands());
DestDisp += ThisLength;		DestDisp += ThisLength;
SrcDisp += ThisLength;		SrcDisp += ThisLength;
Length -= ThisLength;		Length -= ThisLength;
// If there's another CLC to go, branch to the end if a difference		// If there's another CLC to go, branch to the end if a difference
// was found.		// was found.
if (EndMBB && Length > 0) {		if (EndMBB && Length > 0) {
MachineBasicBlock *NextMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *NextMBB = SystemZ::splitBlockBefore(MI, MBB);
BuildMI(MBB, DL, TII->get(SystemZ::BRC))		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_NE)
.addMBB(EndMBB);		.addMBB(EndMBB);
MBB->addSuccessor(EndMBB);		MBB->addSuccessor(EndMBB);
MBB->addSuccessor(NextMBB);		MBB->addSuccessor(NextMBB);
MBB = NextMBB;		MBB = NextMBB;
}		}
}		}
Show All 23 Lines	MachineBasicBlock *SystemZTargetLowering::emitStringWrapper(
uint64_t CharReg = MI.getOperand(3).getReg();		uint64_t CharReg = MI.getOperand(3).getReg();

const TargetRegisterClass *RC = &SystemZ::GR64BitRegClass;		const TargetRegisterClass *RC = &SystemZ::GR64BitRegClass;
uint64_t This1Reg = MRI.createVirtualRegister(RC);		uint64_t This1Reg = MRI.createVirtualRegister(RC);
uint64_t This2Reg = MRI.createVirtualRegister(RC);		uint64_t This2Reg = MRI.createVirtualRegister(RC);
uint64_t End2Reg = MRI.createVirtualRegister(RC);		uint64_t End2Reg = MRI.createVirtualRegister(RC);

MachineBasicBlock *StartMBB = MBB;		MachineBasicBlock *StartMBB = MBB;
MachineBasicBlock *DoneMBB = splitBlockBefore(MI, MBB);		MachineBasicBlock *DoneMBB = SystemZ::splitBlockBefore(MI, MBB);
MachineBasicBlock *LoopMBB = emitBlockAfter(StartMBB);		MachineBasicBlock *LoopMBB = SystemZ::emitBlockAfter(StartMBB);

// StartMBB:		// StartMBB:
// # fall through to LoopMMB		// # fall through to LoopMMB
MBB->addSuccessor(LoopMBB);		MBB->addSuccessor(LoopMBB);

// LoopMBB:		// LoopMBB:
// %This1Reg = phi [ %Start1Reg, StartMBB ], [ %End1Reg, LoopMBB ]		// %This1Reg = phi [ %Start1Reg, StartMBB ], [ %End1Reg, LoopMBB ]
// %This2Reg = phi [ %Start2Reg, StartMBB ], [ %End2Reg, LoopMBB ]		// %This2Reg = phi [ %Start2Reg, StartMBB ], [ %End2Reg, LoopMBB ]
▲ Show 20 Lines • Show All 94 Lines • ▼ Show 20 Lines	MachineBasicBlock *SystemZTargetLowering::emitLoadAndTestCmp0(
BuildMI(*MBB, MI, DL, TII->get(Opcode), DstReg)		BuildMI(*MBB, MI, DL, TII->get(Opcode), DstReg)
.addReg(SrcReg)		.addReg(SrcReg)
.setMIFlags(MI.getFlags());		.setMIFlags(MI.getFlags());
MI.eraseFromParent();		MI.eraseFromParent();

return MBB;		return MBB;
}		}

		MachineBasicBlock *SystemZTargetLowering::emitProbedAlloca(
		MachineInstr &MI, MachineBasicBlock *MBB) const {
		MachineFunction &MF = *MBB->getParent();
		MachineRegisterInfo *MRI = &MF.getRegInfo();
		const SystemZInstrInfo *TII =
		static_cast<const SystemZInstrInfo *>(Subtarget.getInstrInfo());
		DebugLoc DL = MI.getDebugLoc();
		const unsigned ProbeSize = getStackProbeSize(MF);
		Register DstReg = MI.getOperand(0).getReg();
		Register SizeReg = MI.getOperand(2).getReg();

		MachineBasicBlock *StartMBB = MBB;
		MachineBasicBlock *DoneMBB = SystemZ::splitBlockAfter(MI, MBB);
		MachineBasicBlock *LoopTestMBB = SystemZ::emitBlockAfter(StartMBB);
		MachineBasicBlock *LoopBodyMBB = SystemZ::emitBlockAfter(LoopTestMBB);
		MachineBasicBlock *TailTestMBB = SystemZ::emitBlockAfter(LoopBodyMBB);
		MachineBasicBlock *TailMBB = SystemZ::emitBlockAfter(TailTestMBB);

		MachineMemOperand *VolLdMMO = MF.getMachineMemOperand(MachinePointerInfo(),
		MachineMemOperand::MOVolatile \| MachineMemOperand::MOLoad, 8, Align(1));

		Register PHIReg = MRI->createVirtualRegister(&SystemZ::ADDR64BitRegClass);
		Register IncReg = MRI->createVirtualRegister(&SystemZ::ADDR64BitRegClass);

		// LoopTestMBB
		// BRC TailTestMBB
		// # fallthrough to LoopBodyMBB
		StartMBB->addSuccessor(LoopTestMBB);
		MBB = LoopTestMBB;
		BuildMI(MBB, DL, TII->get(SystemZ::PHI), PHIReg)
		.addReg(SizeReg)
		.addMBB(StartMBB)
		.addReg(IncReg)
		.addMBB(LoopBodyMBB);
		BuildMI(MBB, DL, TII->get(SystemZ::CLGFI))
		.addReg(PHIReg)
		.addImm(ProbeSize);
		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_LT)
		.addMBB(TailTestMBB);
		MBB->addSuccessor(LoopBodyMBB);
		MBB->addSuccessor(TailTestMBB);

		// LoopBodyMBB: Allocate and probe by means of a volatile compare.
		// J LoopTestMBB
		MBB = LoopBodyMBB;
		BuildMI(MBB, DL, TII->get(SystemZ::SLGFI), IncReg)
		.addReg(PHIReg)
		.addImm(ProbeSize);
		BuildMI(MBB, DL, TII->get(SystemZ::SLGFI), SystemZ::R15D)
		.addReg(SystemZ::R15D)
		.addImm(ProbeSize);
		BuildMI(MBB, DL, TII->get(SystemZ::CG)).addReg(SystemZ::R15D)
		.addReg(SystemZ::R15D).addImm(ProbeSize - 8).addReg(0)
		.setMemRefs(VolLdMMO);
		BuildMI(MBB, DL, TII->get(SystemZ::J)).addMBB(LoopTestMBB);
		MBB->addSuccessor(LoopTestMBB);

		// TailTestMBB
		// BRC DoneMBB
		// # fallthrough to TailMBB
		MBB = TailTestMBB;
		BuildMI(MBB, DL, TII->get(SystemZ::CGHI))
		.addReg(PHIReg)
		.addImm(0);
		BuildMI(MBB, DL, TII->get(SystemZ::BRC))
		.addImm(SystemZ::CCMASK_ICMP).addImm(SystemZ::CCMASK_CMP_EQ)
		.addMBB(DoneMBB);
		MBB->addSuccessor(TailMBB);
		MBB->addSuccessor(DoneMBB);

		// TailMBB
		// # fallthrough to DoneMBB
		MBB = TailMBB;
		BuildMI(MBB, DL, TII->get(SystemZ::SLGR), SystemZ::R15D)
		.addReg(SystemZ::R15D)
		.addReg(PHIReg);
		BuildMI(MBB, DL, TII->get(SystemZ::CG)).addReg(SystemZ::R15D)
		.addReg(SystemZ::R15D).addImm(-8).addReg(PHIReg)
		.setMemRefs(VolLdMMO);
		MBB->addSuccessor(DoneMBB);

		// DoneMBB
		MBB = DoneMBB;
		BuildMI(*MBB, MBB->begin(), DL, TII->get(TargetOpcode::COPY), DstReg)
		.addReg(SystemZ::R15D);

		MI.eraseFromParent();
		return DoneMBB;
		}

MachineBasicBlock *SystemZTargetLowering::EmitInstrWithCustomInserter(		MachineBasicBlock *SystemZTargetLowering::EmitInstrWithCustomInserter(
MachineInstr &MI, MachineBasicBlock *MBB) const {		MachineInstr &MI, MachineBasicBlock *MBB) const {
switch (MI.getOpcode()) {		switch (MI.getOpcode()) {
case SystemZ::Select32:		case SystemZ::Select32:
case SystemZ::Select64:		case SystemZ::Select64:
case SystemZ::SelectF32:		case SystemZ::SelectF32:
case SystemZ::SelectF64:		case SystemZ::SelectF64:
case SystemZ::SelectF128:		case SystemZ::SelectF128:
▲ Show 20 Lines • Show All 244 Lines • ▼ Show 20 Lines	case SystemZ::TBEGINC:
return emitTransactionBegin(MI, MBB, SystemZ::TBEGINC, true);		return emitTransactionBegin(MI, MBB, SystemZ::TBEGINC, true);
case SystemZ::LTEBRCompare_VecPseudo:		case SystemZ::LTEBRCompare_VecPseudo:
return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTEBR);		return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTEBR);
case SystemZ::LTDBRCompare_VecPseudo:		case SystemZ::LTDBRCompare_VecPseudo:
return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTDBR);		return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTDBR);
case SystemZ::LTXBRCompare_VecPseudo:		case SystemZ::LTXBRCompare_VecPseudo:
return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTXBR);		return emitLoadAndTestCmp0(MI, MBB, SystemZ::LTXBR);

		case SystemZ::PROBED_ALLOCA:
		return emitProbedAlloca(MI, MBB);

case TargetOpcode::STACKMAP:		case TargetOpcode::STACKMAP:
case TargetOpcode::PATCHPOINT:		case TargetOpcode::PATCHPOINT:
return emitPatchPoint(MI, MBB);		return emitPatchPoint(MI, MBB);

default:		default:
llvm_unreachable("Unexpected instr type to insert");		llvm_unreachable("Unexpected instr type to insert");
}		}
}		}
Show All 9 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.h

	Show First 20 Lines • Show All 153 Lines • ▼ Show 20 Lines

	namespace SystemZ {			namespace SystemZ {
	int getTwoOperandOpcode(uint16_t Opcode);			int getTwoOperandOpcode(uint16_t Opcode);
	int getTargetMemOpcode(uint16_t Opcode);			int getTargetMemOpcode(uint16_t Opcode);

	// Return a version of comparison CC mask CCMask in which the LT and GT			// Return a version of comparison CC mask CCMask in which the LT and GT
	// actions are swapped.			// actions are swapped.
	unsigned reverseCCMask(unsigned CCMask);			unsigned reverseCCMask(unsigned CCMask);

				// Create a new basic block after MBB.
				MachineBasicBlock emitBlockAfter(MachineBasicBlock MBB);
				// Split MBB after MI and return the new block (the one that contains
				// instructions after MI).
				MachineBasicBlock *splitBlockAfter(MachineBasicBlock::iterator MI,
				MachineBasicBlock *MBB);
				// Split MBB before MI and return the new block (the one that contains MI).
				MachineBasicBlock *splitBlockBefore(MachineBasicBlock::iterator MI,
				MachineBasicBlock *MBB);
	}			}

	class SystemZInstrInfo : public SystemZGenInstrInfo {			class SystemZInstrInfo : public SystemZGenInstrInfo {
	const SystemZRegisterInfo RI;			const SystemZRegisterInfo RI;
	SystemZSubtarget &STI;			SystemZSubtarget &STI;

	void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;			void splitMove(MachineBasicBlock::iterator MI, unsigned NewOpcode) const;
	void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;			void splitAdjDynAlloc(MachineBasicBlock::iterator MI) const;
	▲ Show 20 Lines • Show All 183 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

	Show First 20 Lines • Show All 1,866 Lines • ▼ Show 20 Lines

	unsigned SystemZ::reverseCCMask(unsigned CCMask) {			unsigned SystemZ::reverseCCMask(unsigned CCMask) {
	return ((CCMask & SystemZ::CCMASK_CMP_EQ) \|			return ((CCMask & SystemZ::CCMASK_CMP_EQ) \|
	(CCMask & SystemZ::CCMASK_CMP_GT ? SystemZ::CCMASK_CMP_LT : 0) \|			(CCMask & SystemZ::CCMASK_CMP_GT ? SystemZ::CCMASK_CMP_LT : 0) \|
	(CCMask & SystemZ::CCMASK_CMP_LT ? SystemZ::CCMASK_CMP_GT : 0) \|			(CCMask & SystemZ::CCMASK_CMP_LT ? SystemZ::CCMASK_CMP_GT : 0) \|
	(CCMask & SystemZ::CCMASK_CMP_UO));			(CCMask & SystemZ::CCMASK_CMP_UO));
	}			}

				MachineBasicBlock SystemZ::emitBlockAfter(MachineBasicBlock MBB) {
				MachineFunction &MF = *MBB->getParent();
				MachineBasicBlock *NewMBB = MF.CreateMachineBasicBlock(MBB->getBasicBlock());
				MF.insert(std::next(MachineFunction::iterator(MBB)), NewMBB);
				return NewMBB;
				}

				MachineBasicBlock *SystemZ::splitBlockAfter(MachineBasicBlock::iterator MI,
				MachineBasicBlock *MBB) {
				MachineBasicBlock *NewMBB = emitBlockAfter(MBB);
				NewMBB->splice(NewMBB->begin(), MBB,
				std::next(MachineBasicBlock::iterator(MI)), MBB->end());
				NewMBB->transferSuccessorsAndUpdatePHIs(MBB);
				return NewMBB;
				}

				MachineBasicBlock *SystemZ::splitBlockBefore(MachineBasicBlock::iterator MI,
				MachineBasicBlock *MBB) {
				MachineBasicBlock *NewMBB = emitBlockAfter(MBB);
				NewMBB->splice(NewMBB->begin(), MBB, MI, MBB->end());
				NewMBB->transferSuccessorsAndUpdatePHIs(MBB);
				return NewMBB;
				}

	unsigned SystemZInstrInfo::getLoadAndTrap(unsigned Opcode) const {			unsigned SystemZInstrInfo::getLoadAndTrap(unsigned Opcode) const {
	if (!STI.hasLoadAndTrap())			if (!STI.hasLoadAndTrap())
	return 0;			return 0;
	switch (Opcode) {			switch (Opcode) {
	case SystemZ::L:			case SystemZ::L:
	case SystemZ::LY:			case SystemZ::LY:
	return SystemZ::LAT;			return SystemZ::LAT;
	case SystemZ::LG:			case SystemZ::LG:
	▲ Show 20 Lines • Show All 86 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

	Show All 23 Lines
	// allocated area itself, skipping the outgoing arguments.			// allocated area itself, skipping the outgoing arguments.
	//			//
	// This expands to an LA or LAY instruction. We restrict the offset			// This expands to an LA or LAY instruction. We restrict the offset
	// to the range of LA and keep the LAY range in reserve for when			// to the range of LA and keep the LAY range in reserve for when
	// the size of the outgoing arguments is added.			// the size of the outgoing arguments is added.
	def ADJDYNALLOC : Pseudo<(outs GR64:$dst), (ins dynalloc12only:$src),			def ADJDYNALLOC : Pseudo<(outs GR64:$dst), (ins dynalloc12only:$src),
	[(set GR64:$dst, dynalloc12only:$src)]>;			[(set GR64:$dst, dynalloc12only:$src)]>;

				let Defs = [R15D, CC], Uses = [R15D], hasNoSchedulingInfo = 1,
				usesCustomInserter = 1 in
				def PROBED_ALLOCA : Pseudo<(outs GR64:$dst),
				(ins GR64:$oldSP, GR64:$space),
				[(set GR64:$dst, (z_probed_alloca GR64:$oldSP, GR64:$space))]>;

				let Defs = [R1D, R15D, CC], Uses = [R15D], hasNoSchedulingInfo = 1,
				uweigandUnsubmitted Done Reply Inline Actions Doesn't this also need Defs and Uses to be fully correct? uweigand: Doesn't this also need Defs and Uses to be fully correct?
				jonpaAuthorUnsubmitted Done Reply Inline Actions Ah, yes I suppose that might as well be there... I added the defs and uses and also the side-effects flag since it may expand into a loop. jonpa: Ah, yes I suppose that might as well be there... I added the defs and uses and also the side…
				hasSideEffects = 1 in
				uweigandUnsubmitted Done Reply Inline Actions For consistency with the PROBED_ALLOCA name, maybe this should be called PROBED_STACKALLOC? uweigand: For consistency with the PROBED_ALLOCA name, maybe this should be called PROBED_STACKALLOC?
				def PROBED_STACKALLOC : Pseudo<(outs), (ins i64imm:$stacksize), []>;

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	// Branch instructions			// Branch instructions
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	// Conditional branches.			// Conditional branches.
	let isBranch = 1, isTerminator = 1, Uses = [CC] in {			let isBranch = 1, isTerminator = 1, Uses = [CC] in {
	// It's easier for LLVM to handle these branches in their raw BRC/BRCL form			// It's easier for LLVM to handle these branches in their raw BRC/BRCL form
	▲ Show 20 Lines • Show All 2,269 Lines • Show Last 20 Lines

llvm/lib/Target/SystemZ/SystemZOperators.td

Show All 34 Lines
def SDT_ZWrapPtr : SDTypeProfile<1, 1,		def SDT_ZWrapPtr : SDTypeProfile<1, 1,
[SDTCisSameAs<0, 1>,		[SDTCisSameAs<0, 1>,
SDTCisPtrTy<0>]>;		SDTCisPtrTy<0>]>;
def SDT_ZWrapOffset : SDTypeProfile<1, 2,		def SDT_ZWrapOffset : SDTypeProfile<1, 2,
[SDTCisSameAs<0, 1>,		[SDTCisSameAs<0, 1>,
SDTCisSameAs<0, 2>,		SDTCisSameAs<0, 2>,
SDTCisPtrTy<0>]>;		SDTCisPtrTy<0>]>;
def SDT_ZAdjDynAlloc : SDTypeProfile<1, 0, [SDTCisVT<0, i64>]>;		def SDT_ZAdjDynAlloc : SDTypeProfile<1, 0, [SDTCisVT<0, i64>]>;
		def SDT_ZProbedAlloca : SDTypeProfile<1, 2,
		[SDTCisSameAs<0, 1>,
		SDTCisSameAs<0, 2>,
		SDTCisPtrTy<0>]>;
def SDT_ZGR128Binary : SDTypeProfile<1, 2,		def SDT_ZGR128Binary : SDTypeProfile<1, 2,
[SDTCisVT<0, untyped>,		[SDTCisVT<0, untyped>,
SDTCisInt<1>,		SDTCisInt<1>,
SDTCisInt<2>]>;		SDTCisInt<2>]>;
def SDT_ZBinaryWithFlags : SDTypeProfile<2, 2,		def SDT_ZBinaryWithFlags : SDTypeProfile<2, 2,
[SDTCisInt<0>,		[SDTCisInt<0>,
SDTCisVT<1, i32>,		SDTCisVT<1, i32>,
SDTCisSameAs<0, 2>,		SDTCisSameAs<0, 2>,
▲ Show 20 Lines • Show All 213 Lines • ▼ Show 20 Lines	def z_strict_fcmps : SDNode<"SystemZISD::STRICT_FCMPS", SDT_ZCmp,
[SDNPHasChain]>;		[SDNPHasChain]>;
def z_tm : SDNode<"SystemZISD::TM", SDT_ZICmp>;		def z_tm : SDNode<"SystemZISD::TM", SDT_ZICmp>;
def z_br_ccmask_1 : SDNode<"SystemZISD::BR_CCMASK", SDT_ZBRCCMask,		def z_br_ccmask_1 : SDNode<"SystemZISD::BR_CCMASK", SDT_ZBRCCMask,
[SDNPHasChain]>;		[SDNPHasChain]>;
def z_select_ccmask_1 : SDNode<"SystemZISD::SELECT_CCMASK",		def z_select_ccmask_1 : SDNode<"SystemZISD::SELECT_CCMASK",
SDT_ZSelectCCMask>;		SDT_ZSelectCCMask>;
def z_ipm_1 : SDNode<"SystemZISD::IPM", SDT_ZIPM>;		def z_ipm_1 : SDNode<"SystemZISD::IPM", SDT_ZIPM>;
def z_adjdynalloc : SDNode<"SystemZISD::ADJDYNALLOC", SDT_ZAdjDynAlloc>;		def z_adjdynalloc : SDNode<"SystemZISD::ADJDYNALLOC", SDT_ZAdjDynAlloc>;
		def z_probed_alloca : SDNode<"SystemZISD::PROBED_ALLOCA", SDT_ZProbedAlloca,
		[SDNPHasChain]>;
def z_popcnt : SDNode<"SystemZISD::POPCNT", SDTIntUnaryOp>;		def z_popcnt : SDNode<"SystemZISD::POPCNT", SDTIntUnaryOp>;
def z_smul_lohi : SDNode<"SystemZISD::SMUL_LOHI", SDT_ZGR128Binary>;		def z_smul_lohi : SDNode<"SystemZISD::SMUL_LOHI", SDT_ZGR128Binary>;
def z_umul_lohi : SDNode<"SystemZISD::UMUL_LOHI", SDT_ZGR128Binary>;		def z_umul_lohi : SDNode<"SystemZISD::UMUL_LOHI", SDT_ZGR128Binary>;
def z_sdivrem : SDNode<"SystemZISD::SDIVREM", SDT_ZGR128Binary>;		def z_sdivrem : SDNode<"SystemZISD::SDIVREM", SDT_ZGR128Binary>;
def z_udivrem : SDNode<"SystemZISD::UDIVREM", SDT_ZGR128Binary>;		def z_udivrem : SDNode<"SystemZISD::UDIVREM", SDT_ZGR128Binary>;
def z_saddo : SDNode<"SystemZISD::SADDO", SDT_ZBinaryWithFlags>;		def z_saddo : SDNode<"SystemZISD::SADDO", SDT_ZBinaryWithFlags>;
def z_ssubo : SDNode<"SystemZISD::SSUBO", SDT_ZBinaryWithFlags>;		def z_ssubo : SDNode<"SystemZISD::SSUBO", SDT_ZBinaryWithFlags>;
def z_uaddo : SDNode<"SystemZISD::UADDO", SDT_ZBinaryWithFlags>;		def z_uaddo : SDNode<"SystemZISD::UADDO", SDT_ZBinaryWithFlags>;
▲ Show 20 Lines • Show All 642 Lines • Show Last 20 Lines

llvm/test/CodeGen/SystemZ/stack-clash-dynamic-alloca.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 \| FileCheck %s

				define i32 @fun0(i32 %n) #0 {
				; CHECK-LABEL: fun0:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r11, %r15, 88(%r15)
				; CHECK-NEXT: .cfi_offset %r11, -72
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -160
				; CHECK-NEXT: .cfi_def_cfa_offset 320
				; CHECK-NEXT: lgr %r11, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r11
				; CHECK-NEXT: # kill: def $r2l killed $r2l def $r2d
				; CHECK-NEXT: risbgn %r1, %r2, 30, 189, 2
				; CHECK-NEXT: la %r0, 7(%r1)
				; CHECK-NEXT: risbgn %r1, %r0, 29, 188, 0
				; CHECK-NEXT: clgfi %r1, 4096
				; CHECK-NEXT: jl .LBB0_2
				; CHECK-NEXT: .LBB0_1: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: slgfi %r1, 4096
				; CHECK-NEXT: slgfi %r15, 4096
				; CHECK-NEXT: cg %r15, 4088(%r15)
				; CHECK-NEXT: clgfi %r1, 4096
				; CHECK-NEXT: jhe .LBB0_1
				; CHECK-NEXT: .LBB0_2:
				; CHECK-NEXT: cgije %r1, 0, .LBB0_4
				; CHECK-NEXT: # %bb.3:
				; CHECK-NEXT: slgr %r15, %r1
				; CHECK-NEXT: cg %r15, -8(%r1,%r15)
				; CHECK-NEXT: .LBB0_4:
				; CHECK-NEXT: la %r1, 160(%r15)
				; CHECK-NEXT: lhi %r0, 1
				; CHECK-NEXT: sty %r0, 4792(%r1)
				; CHECK-NEXT: l %r2, 0(%r1)
				; CHECK-NEXT: lmg %r11, %r15, 248(%r11)
				; CHECK-NEXT: br %r14

				%a = alloca i32, i32 %n
				%b = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; Probe size should be modulo stack alignment.
				define i32 @fun1(i32 %n) #0 "stack-probe-size"="1250" {
				; CHECK-LABEL: fun1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r11, %r15, 88(%r15)
				; CHECK-NEXT: .cfi_offset %r11, -72
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -160
				; CHECK-NEXT: .cfi_def_cfa_offset 320
				; CHECK-NEXT: lgr %r11, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r11
				; CHECK-NEXT: # kill: def $r2l killed $r2l def $r2d
				; CHECK-NEXT: risbgn %r1, %r2, 30, 189, 2
				; CHECK-NEXT: la %r0, 7(%r1)
				; CHECK-NEXT: risbgn %r1, %r0, 29, 188, 0
				; CHECK-NEXT: clgfi %r1, 1248
				; CHECK-NEXT: jl .LBB1_2
				; CHECK-NEXT: .LBB1_1: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: slgfi %r1, 1248
				; CHECK-NEXT: slgfi %r15, 1248
				; CHECK-NEXT: cg %r15, 1240(%r15)
				; CHECK-NEXT: clgfi %r1, 1248
				; CHECK-NEXT: jhe .LBB1_1
				; CHECK-NEXT: .LBB1_2:
				; CHECK-NEXT: cgije %r1, 0, .LBB1_4
				; CHECK-NEXT: # %bb.3:
				; CHECK-NEXT: slgr %r15, %r1
				; CHECK-NEXT: cg %r15, -8(%r1,%r15)
				; CHECK-NEXT: .LBB1_4:
				; CHECK-NEXT: la %r1, 160(%r15)
				; CHECK-NEXT: lhi %r0, 1
				; CHECK-NEXT: sty %r0, 4792(%r1)
				; CHECK-NEXT: l %r2, 0(%r1)
				; CHECK-NEXT: lmg %r11, %r15, 248(%r11)
				; CHECK-NEXT: br %r14
				%a = alloca i32, i32 %n
				%b = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; The minimum probe size is the stack alignment.
				define i32 @fun2(i32 %n) #0 "stack-probe-size"="4" {
				; CHECK-LABEL: fun2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r11, %r15, 88(%r15)
				; CHECK-NEXT: .cfi_offset %r11, -72
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: lgr %r1, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r1
				; CHECK-NEXT: aghi %r1, -160
				; CHECK-NEXT: .cfi_def_cfa_offset 320
				; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: aghi %r15, -8
				; CHECK-NEXT: cg %r0, 0(%r15)
				; CHECK-NEXT: clgrjh %r15, %r1, .LBB2_1
				; CHECK-NEXT: # %bb.2:
				; CHECK-NEXT: .cfi_def_cfa_register %r15
				; CHECK-NEXT: lgr %r11, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r11
				; CHECK-NEXT: # kill: def $r2l killed $r2l def $r2d
				; CHECK-NEXT: risbgn %r1, %r2, 30, 189, 2
				; CHECK-NEXT: la %r0, 7(%r1)
				; CHECK-NEXT: risbgn %r1, %r0, 29, 188, 0
				; CHECK-NEXT: clgijl %r1, 8, .LBB2_4
				; CHECK-NEXT: .LBB2_3: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: slgfi %r1, 8
				; CHECK-NEXT: slgfi %r15, 8
				; CHECK-NEXT: cg %r15, 0(%r15)
				; CHECK-NEXT: clgijhe %r1, 8, .LBB2_3
				; CHECK-NEXT: .LBB2_4:
				; CHECK-NEXT: cgije %r1, 0, .LBB2_6
				; CHECK-NEXT: # %bb.5:
				; CHECK-NEXT: slgr %r15, %r1
				; CHECK-NEXT: cg %r15, -8(%r1,%r15)
				; CHECK-NEXT: .LBB2_6:
				; CHECK-NEXT: la %r1, 160(%r15)
				; CHECK-NEXT: lhi %r0, 1
				; CHECK-NEXT: sty %r0, 4792(%r1)
				; CHECK-NEXT: l %r2, 0(%r1)
				; CHECK-NEXT: lmg %r11, %r15, 248(%r11)
				; CHECK-NEXT: br %r14
				%a = alloca i32, i32 %n
				%b = getelementptr inbounds i32, i32* %a, i64 1198
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				attributes #0 = {"probe-stack"="inline-asm"}

llvm/test/CodeGen/SystemZ/stack-clash-protection.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
				; RUN: llc < %s -mtriple=s390x-linux-gnu -mcpu=z14 -O3 \| FileCheck %s
				;
				; Test stack clash protection probing for static allocas.

				; Small: one probe.
				define i32 @fun0() #0 {
				; CHECK-LABEL: fun0:
				; CHECK: # %bb.0:
				; CHECK-NEXT: aghi %r15, -560
				; CHECK-NEXT: .cfi_def_cfa_offset 720
				; CHECK-NEXT: cg %r0, 552(%r15)
				; CHECK-NEXT: mvhi 552(%r15), 1
				; CHECK-NEXT: l %r2, 160(%r15)
				; CHECK-NEXT: aghi %r15, 560
				; CHECK-NEXT: br %r14

				%a = alloca i32, i64 100
				%b = getelementptr inbounds i32, i32* %a, i64 98
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; Medium: two probes.
				define i32 @fun1() #0 {
				; CHECK-LABEL: fun1:
				; CHECK: # %bb.0:
				; CHECK-NEXT: aghi %r15, -4096
				; CHECK-NEXT: .cfi_def_cfa_offset 4256
				; CHECK-NEXT: cg %r0, 4088(%r15)
				; CHECK-NEXT: aghi %r15, -4080
				; CHECK-NEXT: .cfi_def_cfa_offset 8336
				; CHECK-NEXT: cg %r0, 4072(%r15)
				; CHECK-NEXT: mvhi 976(%r15), 1
				; CHECK-NEXT: l %r2, 176(%r15)
				; CHECK-NEXT: aghi %r15, 8176
				; CHECK-NEXT: br %r14

				%a = alloca i32, i64 2000
				%b = getelementptr inbounds i32, i32* %a, i64 200
				store volatile i32 1, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; Large: Use a loop to allocate and probe in steps.
				define i32 @fun2() #0 {
				; CHECK-LABEL: fun2:
				; CHECK: # %bb.0:
				; CHECK-NEXT: lgr %r1, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r1
				; CHECK-NEXT: agfi %r1, -69632
				; CHECK-NEXT: .cfi_def_cfa_offset 69792
				; CHECK-NEXT: .LBB2_1: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: aghi %r15, -4096
				; CHECK-NEXT: cg %r0, 4088(%r15)
				; CHECK-NEXT: clgrjh %r15, %r1, .LBB2_1
				; CHECK-NEXT: # %bb.2:
				; CHECK-NEXT: .cfi_def_cfa_register %r15
				; CHECK-NEXT: aghi %r15, -2544
				; CHECK-NEXT: .cfi_def_cfa_offset 72336
				; CHECK-NEXT: cg %r0, 2536(%r15)
				; CHECK-NEXT: lhi %r0, 1
				; CHECK-NEXT: mvhi 568(%r15), 1
				; CHECK-NEXT: sty %r0, 28968(%r15)
				; CHECK-NEXT: l %r2, 176(%r15)
				; CHECK-NEXT: agfi %r15, 72176
				; CHECK-NEXT: br %r14

				%a = alloca i32, i64 18000
				%b0 = getelementptr inbounds i32, i32* %a, i64 98
				%b1 = getelementptr inbounds i32, i32* %a, i64 7198
				store volatile i32 1, i32* %b0
				store volatile i32 1, i32* %b1
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; Ends evenly on the step so no remainder needed.
				define void @fun3() #0 {
				; CHECK-LABEL: fun3:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lgr %r1, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r1
				; CHECK-NEXT: aghi %r1, -28672
				; CHECK-NEXT: .cfi_def_cfa_offset 28832
				; CHECK-NEXT: .LBB3_1: # %entry
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: aghi %r15, -4096
				; CHECK-NEXT: cg %r0, 4088(%r15)
				; CHECK-NEXT: clgrjh %r15, %r1, .LBB3_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: .cfi_def_cfa_register %r15
				; CHECK-NEXT: mvhi 180(%r15), 0
				; CHECK-NEXT: l %r0, 180(%r15)
				; CHECK-NEXT: aghi %r15, 28672
				; CHECK-NEXT: br %r14
				entry:
				%stack = alloca [7122 x i32], align 4
				%i = alloca i32, align 4
				%0 = bitcast [7122 x i32]* %stack to i8*
				%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
				store volatile i32 0, i32* %i, align 4
				%i.0.i.0.6 = load volatile i32, i32* %i, align 4
				ret void
				}

				; Loop with bigger step.
				define void @fun4() #0 "stack-probe-size"="8192" {
				; CHECK-LABEL: fun4:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lgr %r1, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r1
				; CHECK-NEXT: aghi %r1, -24576
				; CHECK-NEXT: .cfi_def_cfa_offset 24736
				; CHECK-NEXT: .LBB4_1: # %entry
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: aghi %r15, -8192
				; CHECK-NEXT: cg %r0, 8184(%r15)
				; CHECK-NEXT: clgrjh %r15, %r1, .LBB4_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: .cfi_def_cfa_register %r15
				; CHECK-NEXT: aghi %r15, -7608
				; CHECK-NEXT: .cfi_def_cfa_offset 32344
				; CHECK-NEXT: cg %r0, 7600(%r15)
				; CHECK-NEXT: mvhi 180(%r15), 0
				; CHECK-NEXT: l %r0, 180(%r15)
				; CHECK-NEXT: aghi %r15, 32184
				; CHECK-NEXT: br %r14
				entry:
				%stack = alloca [8000 x i32], align 4
				%i = alloca i32, align 4
				%0 = bitcast [8000 x i32]* %stack to i8*
				%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
				store volatile i32 0, i32* %i, align 4
				%i.0.i.0.6 = load volatile i32, i32* %i, align 4
				ret void
				}

				; Probe size should be modulo stack alignment.
				define void @fun5() #0 "stack-probe-size"="4100" {
				; CHECK-LABEL: fun5:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: aghi %r15, -4096
				; CHECK-NEXT: .cfi_def_cfa_offset 4256
				; CHECK-NEXT: cg %r0, 4088(%r15)
				; CHECK-NEXT: aghi %r15, -88
				; CHECK-NEXT: .cfi_def_cfa_offset 4344
				; CHECK-NEXT: cg %r0, 80(%r15)
				; CHECK-NEXT: mvhi 180(%r15), 0
				; CHECK-NEXT: l %r0, 180(%r15)
				; CHECK-NEXT: aghi %r15, 4184
				; CHECK-NEXT: br %r14
				entry:
				%stack = alloca [1000 x i32], align 4
				%i = alloca i32, align 4
				%0 = bitcast [1000 x i32]* %stack to i8*
				%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
				store volatile i32 0, i32* %i, align 4
				%i.0.i.0.6 = load volatile i32, i32* %i, align 4
				ret void
				}

				; The minimum probe size is the stack alignment.
				define void @fun6() #0 "stack-probe-size"="5" {
				; CHECK-LABEL: fun6:
				; CHECK: # %bb.0: # %entry
				; CHECK-NEXT: lgr %r1, %r15
				; CHECK-NEXT: .cfi_def_cfa_register %r1
				; CHECK-NEXT: aghi %r1, -4184
				; CHECK-NEXT: .cfi_def_cfa_offset 4344
				; CHECK-NEXT: .LBB6_1: # %entry
				; CHECK-NEXT: # =>This Inner Loop Header: Depth=1
				; CHECK-NEXT: aghi %r15, -8
				; CHECK-NEXT: cg %r0, 0(%r15)
				; CHECK-NEXT: clgrjh %r15, %r1, .LBB6_1
				; CHECK-NEXT: # %bb.2: # %entry
				; CHECK-NEXT: .cfi_def_cfa_register %r15
				; CHECK-NEXT: mvhi 180(%r15), 0
				; CHECK-NEXT: l %r0, 180(%r15)
				; CHECK-NEXT: aghi %r15, 4184
				; CHECK-NEXT: br %r14
				entry:
				%stack = alloca [1000 x i32], align 4
				%i = alloca i32, align 4
				%0 = bitcast [1000 x i32]* %stack to i8*
				%i.0.i.0..sroa_cast = bitcast i32* %i to i8*
				store volatile i32 0, i32* %i, align 4
				%i.0.i.0.6 = load volatile i32, i32* %i, align 4
				ret void
				}

				; Small with a natural probe (STMG) - needs no extra probe.
				define i32 @fun7() #0 {
				; CHECK-LABEL: fun7:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
				; CHECK-NEXT: .cfi_offset %r14, -48
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -3976
				; CHECK-NEXT: .cfi_def_cfa_offset 4136
				; CHECK-NEXT: brasl %r14, foo@PLT
				; CHECK-NEXT: st %r2, 568(%r15)
				; CHECK-NEXT: l %r2, 176(%r15)
				; CHECK-NEXT: lmg %r14, %r15, 4088(%r15)
				; CHECK-NEXT: br %r14
				%v = call i32 @foo()
				%a = alloca i32, i64 950
				%b = getelementptr inbounds i32, i32* %a, i64 98
				store volatile i32 %v, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				; Medium with an STMG - still needs probing.
				define i32 @fun8() #0 {
				; CHECK-LABEL: fun8:
				; CHECK: # %bb.0:
				; CHECK-NEXT: stmg %r14, %r15, 112(%r15)
				; CHECK-NEXT: .cfi_offset %r14, -48
				; CHECK-NEXT: .cfi_offset %r15, -40
				; CHECK-NEXT: aghi %r15, -3984
				; CHECK-NEXT: .cfi_def_cfa_offset 4144
				; CHECK-NEXT: cg %r0, 3976(%r15)
				; CHECK-NEXT: brasl %r14, foo@PLT
				; CHECK-NEXT: st %r2, 976(%r15)
				; CHECK-NEXT: l %r2, 176(%r15)
				; CHECK-NEXT: lmg %r14, %r15, 4096(%r15)
				; CHECK-NEXT: br %r14

				%v = call i32 @foo()
				%a = alloca i32, i64 952
				%b = getelementptr inbounds i32, i32* %a, i64 200
				store volatile i32 %v, i32* %b
				%c = load volatile i32, i32* %a
				ret i32 %c
				}

				declare i32 @foo()
				attributes #0 = { "probe-stack"="inline-asm" }

This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Implement -fstack-clash-protectionClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 269023

clang/docs/ReleaseNotes.rst

clang/lib/Basic/Targets/SystemZ.h

clang/lib/Driver/ToolChains/Clang.cpp

clang/test/CodeGen/stack-clash-protection.c

clang/test/Driver/stack-clash-protection-02.c

llvm/include/llvm/ADT/Triple.h

llvm/lib/Target/SystemZ/SystemZFrameLowering.h

llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp

llvm/lib/Target/SystemZ/SystemZISelLowering.h

llvm/lib/Target/SystemZ/SystemZISelLowering.cpp

llvm/lib/Target/SystemZ/SystemZInstrInfo.h

llvm/lib/Target/SystemZ/SystemZInstrInfo.cpp

llvm/lib/Target/SystemZ/SystemZInstrInfo.td

llvm/lib/Target/SystemZ/SystemZOperators.td

llvm/test/CodeGen/SystemZ/stack-clash-dynamic-alloca.ll

llvm/test/CodeGen/SystemZ/stack-clash-protection.ll

[SystemZ] Implement -fstack-clash-protection
ClosedPublic