This is an archive of the discontinued LLVM Phabricator instance.

[ARM] Glue register copies to tail calls.
ClosedPublic

Authored by efriedma on Apr 8 2019, 4:39 PM.

Download Raw Diff

Details

Reviewers

dmgreen
t.p.northover

Commits

rZORG2ffbf9a9ac09: [ARM] Glue register copies to tail calls.
rZORG61665e006096: [ARM] Glue register copies to tail calls.
rG2ffbf9a9ac09: [ARM] Glue register copies to tail calls.
rG61665e006096: [ARM] Glue register copies to tail calls.
rG2570e4bb99c9: Merging r360099:
rL360793: Merging r360099:
rG2ea088173df0: [ARM] Glue register copies to tail calls.
rL360099: [ARM] Glue register copies to tail calls.

Summary

This generally follows what other targets do. I don't completely understand why the special case for tail calls existed in the first place; even when the code was committed in r105413, call lowering didn't work in the way described in the comments.

Stack protector lowering breaks if the register copies are not glued to a tail call: we have to insert the stack protector check before the tail call, and we choose the location based on the assumption that all physical register dependencies of a tail call are adjacent to the tail call. (See FindSplitPointForStackProtector.) This is sort of fragile, but I don't see any reason to break that assumption.

I'm guessing nobody has seen this before just because it's hard to convince the scheduler to actually schedule the code in a way that breaks; even without the glue, the only computation that could actually be scheduled after the register copies is the computation of the call address, and the scheduler usually prefers to schedule that before the copies anyway.

Fixes https://bugs.llvm.org/show_bug.cgi?id=41417

Diff Detail

Repository: rL LLVM

Event Timeline

efriedma created this revision.Apr 8 2019, 4:39 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 8 2019, 4:39 PM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald Transcript

I can confirm this fixes both the minimized test case from https://bugs.llvm.org/show_bug.cgi?id=41417, and the full original test case from https://bugs.freebsd.org/237074.

LGTM then.

lib/Target/ARM/ARMISelLowering.cpp
1991	for (auto &RegToPass : RegsToPass)

This revision is now accepted and ready to land.Apr 10 2019, 12:39 AM

@efriedma any more work to be done on this? :)

Closed by commit rL360099: [ARM] Glue register copies to tail calls. (authored by efriedma). · Explain WhyMay 6 2019, 4:22 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Target/

ARM/

ARMISelLowering.cpp

30 lines

test/

CodeGen/

ARM/

tail-call-scheduling.ll

35 lines

Diff 194220

lib/Target/ARM/ARMISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 1,982 Lines • ▼ Show 20 Lines	ARMTargetLowering::LowerCall(TargetLowering::CallLoweringInfo &CLI,
}		}

if (!MemOpChains.empty())		if (!MemOpChains.empty())
Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOpChains);		Chain = DAG.getNode(ISD::TokenFactor, dl, MVT::Other, MemOpChains);

// Build a sequence of copy-to-reg nodes chained together with token chain		// Build a sequence of copy-to-reg nodes chained together with token chain
// and flag operands which copy the outgoing args into the appropriate regs.		// and flag operands which copy the outgoing args into the appropriate regs.
SDValue InFlag;		SDValue InFlag;
// Tail call byval lowering might overwrite argument registers so in case of
// tail call optimization the copies to registers are lowered later.
if (!isTailCall)
for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {		for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {
		dmgreenUnsubmitted Not Done Reply Inline Actions for (auto &RegToPass : RegsToPass) dmgreen: for (auto &RegToPass : RegsToPass)
Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first,		Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first,
RegsToPass[i].second, InFlag);		RegsToPass[i].second, InFlag);
InFlag = Chain.getValue(1);		InFlag = Chain.getValue(1);
}		}

// For tail calls lower the arguments to the 'real' stack slot.
if (isTailCall) {
// Force all the incoming stack arguments to be loaded from the stack
// before any new outgoing arguments are stored to the stack, because the
// outgoing stack slots may alias the incoming argument stack slots, and
// the alias isn't otherwise explicit. This is slightly more conservative
// than necessary, because it means that each store effectively depends
// on every argument instead of just those arguments it would clobber.

// Do not flag preceding copytoreg stuff together with the following stuff.
InFlag = SDValue();
for (unsigned i = 0, e = RegsToPass.size(); i != e; ++i) {
Chain = DAG.getCopyToReg(Chain, dl, RegsToPass[i].first,
RegsToPass[i].second, InFlag);
InFlag = Chain.getValue(1);
}
InFlag = SDValue();
}

// If the callee is a GlobalAddress/ExternalSymbol node (quite common, every		// If the callee is a GlobalAddress/ExternalSymbol node (quite common, every
// direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol		// direct call is) turn it into a TargetGlobalAddress/TargetExternalSymbol
// node so that legalize doesn't hack it.		// node so that legalize doesn't hack it.
bool isDirect = false;		bool isDirect = false;

const TargetMachine &TM = getTargetMachine();		const TargetMachine &TM = getTargetMachine();
const Module *Mod = MF.getFunction().getParent();		const Module *Mod = MF.getFunction().getParent();
const GlobalValue *GV = nullptr;		const GlobalValue *GV = nullptr;
▲ Show 20 Lines • Show All 13,331 Lines • Show Last 20 Lines

test/CodeGen/ARM/tail-call-scheduling.ll

This file was added.

				; RUN: llc < %s \| FileCheck %s
				target triple = "armv6kz-unknown-unknown-gnueabihf"

				; Make sure this doesn't crash, and we actually emit a tail call.
				; Unfortunately, this test is sort of fragile... the original issue only
				; shows up if scheduling happens in a very specific order. But including
				; it anyway just to demonstrate the issue.
				; CHECK: pop {r4, lr}

				@e = external local_unnamed_addr constant [0 x i32 (i32, i32)*], align 4

				; Function Attrs: nounwind sspstrong
				define i32 @AVI_ChunkRead_p_chk(i32 %g) nounwind sspstrong "target-cpu"="arm1176jzf-s" {
				entry:
				%b = alloca i8, align 1
				%tobool = icmp eq i32 %g, 0
				br i1 %tobool, label %if.end, label %if.then

				if.then: ; preds = %entry
				%add = add nsw i32 %g, 1
				%arrayidx = getelementptr inbounds [0 x i32 (i32, i32)], [0 x i32 (i32, i32)]* @e, i32 0, i32 %add
				%0 = load i32 (i32, i32), i32 (i32, i32)* %arrayidx, align 4
				%call = tail call i32 %0(i32 0, i32 0) #3
				br label %return

				if.end: ; preds = %entry
				call void @c(i8* nonnull %b)
				br label %return

				return: ; preds = %if.end, %if.then
				%retval.0 = phi i32 [ %call, %if.then ], [ 0, %if.end ]
				ret i32 %retval.0
				}

				declare void @c(i8*)