This is an archive of the discontinued LLVM Phabricator instance.

[SystemZFrameLowering] Make sure R1 holding the backchain is not corrupted by probing
ClosedPublic

Authored by jonpa on Dec 7 2020, 5:04 PM.

Details

Summary

For a function that saves the backchain while allocating stack space with probing (stack clash protection), R1D is used for both of these purposes. The probing modifies R1D in order to have a reference value for exiting the loop. In order to save the original value of R15D as the backchain after the probing-loop, the value decremented from R1D must be added back.

An additional improvement is to not copy R15 to R1 in this case, since then it has already been done for the purpose of the backchain.

I am not sure if there is a better way, but this seems ok as long as adding it back is just a single instruction (would there be any other reg available?).

This was incidentally discovered as a big function with "backchain" and "probe-stack"="inline-asm" was discovered by csmith/machine-verifier to not have R1D live-in to DoneMBB where the backchain is saved by an STG. This was due to the curious fact that the StackAllocMI is erased *after* recomputeLiveIns(*DoneMBB) is called, and since PROBED_STACKALLOC is marked only as defining R1D, it was not live-in.

I started by adding R1D as live into DoneMBB in SystemZFrameLowering::inlineStackProbe(), but then realized that the problem seems to be even worse: R1D is used in the probing loop after first being decremented with LoopAlloc. That means that the backchain value is in fact not the incoming R15D anymore in this case.

Diff Detail

Event Timeline

jonpa created this revision.Dec 7 2020, 5:04 PM
jonpa requested review of this revision.Dec 7 2020, 5:04 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 7 2020, 5:04 PM

I'm not sure I like the implicit assumptions the patch makes between inlineStackProbe and emitPrologue.

I think the code would be clearer if manipulation of the backchain were moved next to manipulation of r15. That is to say, emitPrologue should only store the backchain in the case where it itself updates r15. In the case where the r15 update happens in inlineStackProbe then storing the backchain should also happen in inlineStackProbe.

Then the code in inlineStackProbe can be self-contained and use r1 however it likes. If it helps, the code might also want to use r0 as a second temporary register to hold the backchain value temporarily. Also, strictly speaking the backchain ought to stored every time the stack pointer is updated, so it could be stored every time through the loop. (Then the store would also implicitly serve as probe.)

jonpa updated this revision to Diff 310292.Dec 8 2020, 11:05 AM

Updated per review. R0D is now used for the loop exit check while probing.

jonpa added a comment.Dec 8 2020, 11:11 AM

Hmm... now that R0D is used for the loop exit, and R1D is used for the backchain, perhaps the backchain actually could be handled just in emitPrologue()?

I think it still makes sense to have the backchain store local in inlineStackProbe.

In fact, I think it would be best to have the backchain store in every iteration of the loop, i.e. to the store in allocateAndProbe (of course that means the store then implicitly acts as probe so we don't need the volatile compare any more if we have a backchain).

// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;

Given that this is now duplicated, maybe it would make sense to have that logic in a separate function "getBackchainOffset(MF)" or the like.

jonpa added a comment.EditedDec 9 2020, 12:19 PM

I think it still makes sense to have the backchain store local in inlineStackProbe.

In fact, I think it would be best to have the backchain store in every iteration of the loop, i.e. to the store in allocateAndProbe (of course that means the store then implicitly acts as probe so we don't need the volatile compare any more if we have a backchain).

I remember there was an issue with "store tags" which we are handling for instance when we do loop-unrolling. But maybe that is not an issue any more on newer machines (and maybe we don't need to consider that in unrolling then either)?

uweigand accepted this revision.Dec 10 2020, 2:13 AM

This suggestion wasn't for performance reasons, but correctness. In theory, when using the backchain, it should be updated on every change to %r15 so that %r15 at all times points to a valid backchain value. Otherwise, unwinding using the backchain will randomly fail when starting at some PC where the stack pointer has been updated but the backchain not yet written.

However, thinking about this again, it probably doesn't matter in this case since there's nothing in this loop where we might be triggering unwinding. In any case, GCC also doesn't seem to update the backchain each time through the loop, so we're probably fine without.

So I think this patch LGTM now.

This suggestion:

// The back chain is stored topmost with packed-stack.
int Offset = usePackedStack(MF) ? SystemZMC::CallFrameSize - 8 : 0;

Given that this is now duplicated, maybe it would make sense to have that logic in a separate function "getBackchainOffset(MF)" or the like.

is still valid, but can be done in a separate patch. Looks like there are other places where this offset calculation is duplicated, they should really all be merged.

B.t.w. it seems lowerDYNAMIC_STACKALLOC and lowerSTACKRESTORE always use 0 as backchain offset, so those would be incorrect with the kernel backchain? I guess the kernel rarely uses dynamic stack allocation, but it seems those two places still ought to be fixed.

This revision is now accepted and ready to land.Dec 10 2020, 2:13 AM
jonpa updated this revision to Diff 311072.Dec 10 2020, 5:20 PM

Sorry - had to revert patch since the live-in lists had not been handled properly.

I think the right thing to do is to compute them after all else - the saving of the backchain and removal of the pseudo.

jonpa reopened this revision.Dec 10 2020, 5:20 PM
This revision is now accepted and ready to land.Dec 10 2020, 5:20 PM
jonpa requested review of this revision.Dec 10 2020, 5:21 PM
uweigand accepted this revision.Dec 11 2020, 12:29 AM

Ah, OK . Looks good to me.

This revision is now accepted and ready to land.Dec 11 2020, 12:29 AM