This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Target/AArch64/
-
Target/
-
AArch64/
1
AArch64ISelLowering.cpp
-
test/CodeGen/AArch64/
-
CodeGen/
-
AArch64/
-
seh-localaddress.ll

Differential D57183

[COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects
ClosedPublic

Authored by mgrang on Jan 24 2019, 1:21 PM.

Download Raw Diff

Details

Reviewers

rnk
efriedma
ssijaric
TomTan

Commits

rG70d484d94e3e: [COFF, ARM64] Fix localaddress to handle stack realignment and variable size…
rL352923: [COFF, ARM64] Fix localaddress to handle stack realignment and variable size…

Summary

This fixes using the correct stack registers for SEH when stack realignment is needed or when variable size objects are present.

Diff Detail

Event Timeline

mgrang created this revision.Jan 24 2019, 1:21 PM

Herald added subscribers: kristof.beyls, javed.absar. · View Herald TranscriptJan 24 2019, 1:21 PM

efriedma added inline comments.Jan 24 2019, 1:42 PM

lib/Target/AArch64/AArch64ISelLowering.cpp
2750	This could use a brief comment to explain why stack realignment matters.

Why isn't SP used in this case? I'd expect that the frame pointer addresses parameters, followed by the alignment gap, followed by the locals area, which is addressable with fixed offsets from SP.

If there's a call to localaddress in a function without funclets or VLAs, we should use sp, yes. That should be rare in practice, but I guess the testcase is an example.

In D57183#1370366, @efriedma wrote:

If there's a call to localaddress in a function without funclets or VLAs, we should use sp, yes. That should be rare in practice, but I guess the testcase is an example.

So I guess the logic should be something like this:

if (!hasVarSizedObjects && !hasFunclets) --> use SP
else if (needsStackRealignment) --> use BP
else --> use FP

Yes, that looks right.

In D57183#1370450, @mgrang wrote:
In D57183#1370366, @efriedma wrote:

If there's a call to localaddress in a function without funclets or VLAs, we should use sp, yes. That should be rare in practice, but I guess the testcase is an example.

So I guess the logic should be something like this:
if (!hasVarSizedObjects && !hasFunclets) --> use SP
else if (needsStackRealignment) --> use BP
else --> use FP

Is that not what the existing code does? RegInfo->getFrameRegister(MF) does this to choose between SP and FP:

unsigned
AArch64RegisterInfo::getFrameRegister(const MachineFunction &MF) const {
  const AArch64FrameLowering *TFI = getFrameLowering(MF);
  return TFI->hasFP(MF) ? AArch64::FP : AArch64::SP;
}

And hasBasePointer checks the same conditions you've listed here.

...

I see, hasFP returns true when there are calls and the stack frame is large, and also in this case:

// Win64 SEH requires frame pointer if funclets are present.
if (MF.hasLocalEscape())
  return true;

Are you sure we still want that there? It sounds like we are trying to allow addressing variables with SP when localescape is present.

hasBasePointer checks the same conditions

We sometimes emit a base pointer or a frame pointer when it isn't strictly necessary. We don't want to change the result of llvm.localaddress() in that case.

The MF.hasLocalEscape() check in hasFP() probably isn't necessary.

In the existing code, there are several conditions on when FP/BP/SP should be used. I have tried to summarize them here:

use BP:
  if ((hasVarSizedObjects || hasEHFunclets)) && (needsStackRealignment || LocalFrameSize >= 256))

use FP:	
  if (hasEHFunclets || hasVarSizedObjects || needsStackRealignment || hasLocalEscape ||
      hasCalls || isFrameAddressTaken || hasStackMap || hasPatchPoint || !MaxCallFrameSizeComputed ||
      MaxCallFrameSize > DefaultSafeSPDisplacement)

else use SP

Here's my understanding of how the locals should be accessed in various scenarios:

struct S { int x; };

// Use FP to access escaped locals: (hasFunclets = true, hasVarSizedObjects = false, needsStackRealignment = false)
void simple() {
  struct S o;

  __try { o.x; }
  __finally { o.x; }
}

// Use BP to access escaped locals: (hasFunclets = true, hasVarSizedObjects = false, needsStackRealignment = true)
void stack_realignment() {
  struct S __declspec(align(32)) o;

  __try { o.x; }
  __finally { o.x; }
}

// Use BP to access escaped locals: (hasFunclets = true, hasVarSizedObjects = true, needsStackRealignment = false)
void vla_present(int n) {
  int vla[n];

  __try { vla[0]; }
  __finally { vla[0]; }
}

// Use BP to access escaped locals: (hasFunclets = true, hasVarSizedObjects = true, needsStackRealignment = true)
void all(int n) {
  struct S __declspec(align(32)) o;
  int vla[n];
  
  __try { o.x; vla[0]; }
  __finally { o.x; vla[0]; }
}

// Use SP to access locals: (hasFunclets = false, hasVarSizedObjects = false, needsStackRealignment = false)
void non_seh() {
  // call to llvm.localaddress();
}

@rnk @efriedma Could you please comment on if all the scenarios have been captured here and if the behavior is what is expected?

use FP:

if (hasEHFunclets || hasVarSizedObjects || needsStackRealignment || hasLocalEscape ||
    hasCalls || isFrameAddressTaken || hasStackMap || hasPatchPoint || !MaxCallFrameSizeComputed ||
    MaxCallFrameSize > DefaultSafeSPDisplacement)

Should FP be used if needsStackRealignment is true and none of the other conditions are true? I would've expected that SP needs to be used, because FP points to the parameter space before the stack was realigned.

I suppose that it is correct to force a frame to use FP when hasLocalEscape is true, assuming we've already checked the conditions under which BP is needed.

// Use BP to access escaped locals: (hasFunclets = true, hasVarSizedObjects = true, needsStackRealignment = false)
void vla_present(int n) {

Should still use fp here. (Having both VLAs and funclets isn't really any different from having only one of them.)

Deleted test/CodeGen/AArch64/seh-localescape.ll as localescape testing is better covered under the updated test/CodeGen/AArch64/seh-finally.ll.

mgrang retitled this revision from [COFF, ARM64] Fix localaddress to handle stack realignment to [COFF, ARM64] Fix localaddress to handle stack realignment and variable size objects.Jan 30 2019, 4:33 PM

With this patch, most of the SEH tests in https://github.com/Microsoft/windows_seh_tests/blob/master/src/xcpt4/xcpt4u.c pass.

mgrang added a reviewer: TomTan.Jan 30 2019, 4:49 PM

In D57183#1378121, @mgrang wrote:

With this patch, most of the SEH tests in https://github.com/Microsoft/windows_seh_tests/blob/master/src/xcpt4/xcpt4u.c pass.

Great to have this. Thanks @mgrang. Curious on what remaining tests fail in xcpt4u.c?

LGTM. I think this is the correct model for representing frame offsets for localescape/localrecover.

(The code to handle eh_recoverfp correctly is still missing, but I think it's okay to handle that in a separate patch.)

Please give Reid a few days to comment before you merge, though.

This revision is now accepted and ready to land.Jan 30 2019, 5:50 PM

I see, we need a separate register selection codepath that ignores frame size considerations. (I could be wrong, I haven't dug that deep, though.)

I think the functionality is good, but I keep insisting that we name these routines after the localescape / localrecover intrinsic set, since they are ostensibly for lambdas as well as SEH. :) Feel free to use your judgement, don't want for me to review it.

include/llvm/CodeGen/TargetFrameLowering.h
267 ↗	(On Diff #184403)	Let's call this `getNonLocalFrameIndexReference`. In theory, this should have nothing to do with EH. llvm.localescape is a separate LLVM IR feature that happens to support the frontend-outlined try / except / __finally funclets.
lib/Target/AArch64/AArch64RegisterInfo.h
125 ↗	(On Diff #184403)	Similarly, perhaps this should be `getLocalAddressRegister` so it's associated with the intrinsics.

mgrang updated this revision to Diff 184818.Feb 1 2019, 12:39 PM

mgrang marked 2 inline comments as done.

Closed by commit rL352923: [COFF, ARM64] Fix localaddress to handle stack realignment and variable size… (authored by mgrang). · Explain WhyFeb 1 2019, 1:41 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptFeb 1 2019, 1:41 PM

In D57183#1378139, @TomTan wrote:

In D57183#1378121, @mgrang wrote:

With this patch, most of the SEH tests in https://github.com/Microsoft/windows_seh_tests/blob/master/src/xcpt4/xcpt4u.c pass.

Great to have this. Thanks @mgrang. Curious on what remaining tests fail in xcpt4u.c?

@TomTan Tests with features which have not yet been implemented (like __leave and asynchronous SEH) are failing.

Revision Contents

Path

Size

lib/

Target/

AArch64/

AArch64ISelLowering.cpp

16 lines

test/

CodeGen/

AArch64/

seh-localaddress.ll

55 lines

Diff 183381

lib/Target/AArch64/AArch64ISelLowering.cpp

This file is larger than 256 KB, so syntax highlighting is disabled by default.

Show First 20 Lines • Show All 2,738 Lines • ▼ Show 20 Lines	SDValue AArch64TargetLowering::LowerINTRINSIC_WO_CHAIN(SDValue Op,
case Intrinsic::aarch64_neon_smin:		case Intrinsic::aarch64_neon_smin:
return DAG.getNode(ISD::SMIN, dl, Op.getValueType(),		return DAG.getNode(ISD::SMIN, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));
case Intrinsic::aarch64_neon_umin:		case Intrinsic::aarch64_neon_umin:
return DAG.getNode(ISD::UMIN, dl, Op.getValueType(),		return DAG.getNode(ISD::UMIN, dl, Op.getValueType(),
Op.getOperand(1), Op.getOperand(2));		Op.getOperand(1), Op.getOperand(2));

case Intrinsic::localaddress: {		case Intrinsic::localaddress: {
// Returns one of the stack, base, or frame pointer registers, depending on		auto &MF = DAG.getMachineFunction();
// which is used to reference local variables.		const auto *RegInfo = Subtarget->getRegisterInfo();
MachineFunction &MF = DAG.getMachineFunction();
const AArch64RegisterInfo *RegInfo = Subtarget->getRegisterInfo();		auto Reg = RegInfo->needsStackRealignment(MF) ?
		efriedmaUnsubmitted Not Done Reply Inline Actions This could use a brief comment to explain why stack realignment matters. efriedma: This could use a brief comment to explain why stack realignment matters.
unsigned Reg;		RegInfo->getBaseRegister() :
if (RegInfo->hasBasePointer(MF))		RegInfo->getFrameRegister(MF);
Reg = RegInfo->getBaseRegister();
else // This function handles the SP or FP case.
Reg = RegInfo->getFrameRegister(MF);
return DAG.getCopyFromReg(DAG.getEntryNode(), dl, Reg,		return DAG.getCopyFromReg(DAG.getEntryNode(), dl, Reg,
Op.getSimpleValueType());		Op.getSimpleValueType());
}		}

case Intrinsic::eh_recoverfp: {		case Intrinsic::eh_recoverfp: {
// FIXME: This needs to be implemented to correctly handle highly aligned		// FIXME: This needs to be implemented to correctly handle highly aligned
// stack objects. For now we simply return the incoming FP. Refer D53541		// stack objects. For now we simply return the incoming FP. Refer D53541
// for more details.		// for more details.
▲ Show 20 Lines • Show All 9,123 Lines • Show Last 20 Lines

test/CodeGen/AArch64/seh-localaddress.ll

This file was added.

				; RUN: llc -mtriple arm64-windows -o - %s \| FileCheck %s

				; struct S { int x; };
				; void foo() {
				; struct S __declspec(align(32)) o;
				; __try { o.x; }
				; __finally { o.x; }
				; }
				; void bar() {
				; struct S o;
				; __try { o.x; }
				; __finally { o.x; }
				; }

				%struct.S = type { i32 }

				define dso_local void @"?foo@@YAXXZ"() #0 {
				entry:
				; CHECK-LABEL: foo
				; CHECK: mov x1, x19
				; CHECK-NOT: mov x1, x29

				%o = alloca %struct.S, align 32
				call void (...) @llvm.localescape(%struct.S* %o)
				%x = getelementptr inbounds %struct.S, %struct.S* %o, i32 0, i32 0
				%0 = call i8* @llvm.localaddress()
				call void @"?fin$0@0@foo@@"(i8 0, i8* %0)
				ret void
				}

				define dso_local void @"?bar@@YAXXZ"() {
				entry:
				; CHECK-LABEL: bar
				; CHECK: mov x1, x29
				; CHECK-NOT: mov x1, x19

				%o = alloca %struct.S, align 4
				call void (...) @llvm.localescape(%struct.S* %o)
				%x = getelementptr inbounds %struct.S, %struct.S* %o, i32 0, i32 0
				%0 = call i8* @llvm.localaddress()
				call void @"?fin$0@0@bar@@"(i8 0, i8* %0)
				ret void
				}

				declare void @"?fin$0@0@foo@@"(i8 %abnormal_termination, i8* %frame_pointer)

				declare void @"?fin$0@0@bar@@"(i8 %abnormal_termination, i8* %frame_pointer)

				declare i8* @llvm.localrecover(i8, i8, i32)

				declare i8* @llvm.localaddress()

				declare void @llvm.localescape(...)

				attributes #0 = { noinline optnone uwtable }