This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/MC/
-
llvm/
-
MC/
-
MCWinEH.h
-
lib/MC/
-
MC/
-
MCWin64EH.cpp
-
test/MC/AArch64/
-
MC/
-
AArch64/
1/1
seh.s

Differential D86527

[3/5] [MC] [Win64EH] Produce well-formed xdata records when info is missing
ClosedPublic

Authored by mstorsjo on Aug 25 2020, 4:34 AM.

Download Raw Diff

Details

Reviewers

efriedma
ssijaric
TomTan

Commits

rG37ef743cbf3f: [MC] [Win64EH] Avoid producing malformed xdata records

Summary

If there's no unwinding opcodes, omit writing the xdata/pdata records.

If writing of an xdata record is forced via the .seh_handlerdata directive, make sure the xdata record is of a valid length even though it's bogus.

Previously, this generated truncated xdata records, and llvm-readobj would error out when trying to print them.

Diff Detail

Event Timeline

mstorsjo created this revision.Aug 25 2020, 4:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 25 2020, 4:34 AM

Herald added a subscriber: hiraditya. · View Herald Transcript

mstorsjo requested review of this revision.Aug 25 2020, 4:34 AM

mstorsjo added a parent revision: D86524: [2/5] [MC] [Win64EH] Update the AArch64/seh.s test slightly. NFC..

mstorsjo added a child revision: D86528: [4/5] [MC] [Win64EH] Fill in FuncletOrFuncEnd if missing.Aug 25 2020, 4:38 AM

Harbormaster completed remote builds in B69433: Diff 287629.Aug 25 2020, 5:16 AM

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

In D86527#2237241, @efriedma wrote:

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

Fair enough. In most cases it's straightforward to do that, but there's one main exception. There's a .seh_handlerdata directive, and on that, it triggers emission of the xdata record right away. And when emitting the xdata record, the FrameInfo gets cleared of the opcodes, so it's not entirely evident at that point whether the xdata already emitted was bogus or not.

Or I guess one possible solution to that would be to add another flag to FrameInfo after the xdata was written, indicating whether it was sensible or not. (Or just clear the info->Symbol member to pretend that it wasn't ever emitted, even if there's a few orphaned bytes in the xdata section?)

Avoid writing the xdata/pdata entries where easily possible, but kept the code for avoiding malformed entries for the cases where we are forced to emit them.

In D86527#2238067, @mstorsjo wrote:

In D86527#2237241, @efriedma wrote:

If there aren't any unwind codes, the compiler should completely skip emitting the .pdata record. That's what the spec suggests the compiler should do, and probably matches what tooling expects. So I think if we do end up with SEH directives without any SEH opcodes, we should either omit the .pdata record, or reject the assembly.

Fair enough. In most cases it's straightforward to do that, but there's one main exception. There's a .seh_handlerdata directive, and on that, it triggers emission of the xdata record right away. And when emitting the xdata record, the FrameInfo gets cleared of the opcodes, so it's not entirely evident at that point whether the xdata already emitted was bogus or not.

Or I guess one possible solution to that would be to add another flag to FrameInfo after the xdata was written, indicating whether it was sensible or not. (Or just clear the info->Symbol member to pretend that it wasn't ever emitted, even if there's a few orphaned bytes in the xdata section?)

I don't understand how we end up in this position in the first place. Why does .seh_handlerdata need to trigger the emission of unwind data? Why is it okay if the resulting unwind data is nonsense?

In D86527#2240344, @efriedma wrote:

I don't understand how we end up in this position in the first place. Why does .seh_handlerdata need to trigger the emission of unwind data?

Because .seh_handlerdata changes the active section to the xdata section, emits the unwind info record, letting you append handler specific data which is supposed to follow directly after the unwind data itself. (See the testcase in this patch.)

Why is it okay if the resulting unwind data is nonsense?

Well it's at least parsable, but nonsense, which is better than unparseable records.

But I don't believe this case, .seh_handlerdata without any valid opcodes really happens in the wild - so the exact handling doesn't matter much. Functions with just .seh_proc/.seh_endproc do occur though, but those are easy to omit and handle cleanly (which the current revision if the patch does).

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry. Can we output warnings from this layer?

llvm/test/MC/AArch64/seh.s
97	The `.long 0` here goes in the xdata section, appended after the unwind info.

letting you append handler specific data which is supposed to follow directly after the unwind data itself

Oh, that's the part I was missing; thanks. So in well-formed code, .seh_handlerdata should come after an .seh_endprologue, and there shouldn't be any .seh_* directives or instructions between the .seh_handlerdata and the .seh_endproc?

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry.

That's probably makes the most sense, yes.

Can we output warnings from this layer?

Technically yes, but you've lost the source location by the time you get this deep, so it wouldn't be pretty. Probably we should do some primitive tracking of the SEH state in the asmparser, and emit a warning from there.

In D86527#2240604, @efriedma wrote:

letting you append handler specific data which is supposed to follow directly after the unwind data itself

Oh, that's the part I was missing; thanks. So in well-formed code, .seh_handlerdata should come after an .seh_endprologue, and there shouldn't be any .seh_* directives or instructions between the .seh_handlerdata and the .seh_endproc?

It's actually even a bit stricter/worse than that. Not only does the xdata record contain the unwind opcodes themselves, but it also contains the function length field. So ideally .seh_handlerdata comes after .seh_endfunclet, so that the full function length is known.

In practice, there can be cases where .seh_handlerdata comes before .seh_endfunclet (or functions without that altogether), and then we need to set the function length up to the current point, as I do in D86528. This means that the actual unwindable region of the function only is up to this point. So if we have .seh_handlerdata directly after the prologue, one can't actually unwind from the body of the function, only within the prologue itself. So ideally .seh_handlerdata really should be as far to the end of the function as possible.

Then there's real world messes like https://github.com/mingw-w64/mingw-w64/blob/master/mingw-w64-crt/crt/crtexe.c#L179-L198, where .seh_handlerdata is injected via inline assembly in C code. This works fine in x86_64, because the function length itself isn't embedded in the xdata record, but is handled via the BeginAddress/EndAddress pair in the pdata record. But for the aarch64 case, that code needs to be adjusted to move the .seh_handlerdata bit to the end of the function. (I'll try to get that fixed after these patches settle.) It won't cover the epilogue of the function, but would at least cover the body.

For the contrieved .seh_handlerdata case, we could avoid outputting the unwind info itself, leave the orphaned handler specific data in the section, and not hook up the pdata entry.

That's probably makes the most sense, yes.

Ok, will try to do that then.

Can we output warnings from this layer?

Technically yes, but you've lost the source location by the time you get this deep, so it wouldn't be pretty. Probably we should do some primitive tracking of the SEH state in the asmparser, and emit a warning from there.

Even without the source location, just giving the function name might be context enough - you'd probably only have this in cases with assembly involved anyway. But it's probably not necessary.

Omitting empty unwind info even when triggered by a .seh_handlerdata directive, added an error for the case if such a skipped entry (that was supposed to be emitted, but wasn't) later got more unwind info - which should trigger users to move the .seh_handlerdata further ahead if that ever happens.

LGTM

This revision is now accepted and ready to land.Aug 27 2020, 2:14 PM

Closed by commit rG37ef743cbf3f: [MC] [Win64EH] Avoid producing malformed xdata records (authored by mstorsjo). · Explain WhyAug 27 2020, 11:24 PM

This revision was automatically updated to reflect the committed changes.

mstorsjo added a commit: rG37ef743cbf3f: [MC] [Win64EH] Avoid producing malformed xdata records.

Revision Contents

Path

Size

llvm/

include/

llvm/

MC/

MCWinEH.h

9 lines

lib/

MC/

MCWin64EH.cpp

21 lines

test/

MC/

AArch64/

seh.s

30 lines

Diff 287866

llvm/include/llvm/MC/MCWinEH.h

Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	struct FrameInfo {

FrameInfo() = default;		FrameInfo() = default;
FrameInfo(const MCSymbol Function, const MCSymbol BeginFuncEHLabel)		FrameInfo(const MCSymbol Function, const MCSymbol BeginFuncEHLabel)
: Begin(BeginFuncEHLabel), Function(Function) {}		: Begin(BeginFuncEHLabel), Function(Function) {}
FrameInfo(const MCSymbol Function, const MCSymbol BeginFuncEHLabel,		FrameInfo(const MCSymbol Function, const MCSymbol BeginFuncEHLabel,
const FrameInfo *ChainedParent)		const FrameInfo *ChainedParent)
: Begin(BeginFuncEHLabel), Function(Function),		: Begin(BeginFuncEHLabel), Function(Function),
ChainedParent(ChainedParent) {}		ChainedParent(ChainedParent) {}

		bool empty() const {
		if (!Instructions.empty())
		return false;
		for (const auto &E : EpilogMap)
		if (!E.second.empty())
		return false;
		return true;
		}
};		};

class UnwindEmitter {		class UnwindEmitter {
public:		public:
virtual ~UnwindEmitter();		virtual ~UnwindEmitter();

/// This emits the unwind info sections (.pdata and .xdata in PE/COFF).		/// This emits the unwind info sections (.pdata and .xdata in PE/COFF).
virtual void Emit(MCStreamer &Streamer) const = 0;		virtual void Emit(MCStreamer &Streamer) const = 0;
virtual void EmitUnwindInfo(MCStreamer &Streamer, FrameInfo *FI) const = 0;		virtual void EmitUnwindInfo(MCStreamer &Streamer, FrameInfo *FI) const = 0;
};		};
}		}
}		}

#endif		#endif

llvm/lib/MC/MCWin64EH.cpp

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	static void ARM64EmitUnwindInfo(MCStreamer &streamer, WinEH::FrameInfo *info) {

// Code Words, Epilog count, E, X, Vers, Function Length		// Code Words, Epilog count, E, X, Vers, Function Length
uint32_t row1 = 0x0;		uint32_t row1 = 0x0;
uint32_t CodeWords = TotalCodeBytes / 4;		uint32_t CodeWords = TotalCodeBytes / 4;
uint32_t CodeWordsMod = TotalCodeBytes % 4;		uint32_t CodeWordsMod = TotalCodeBytes % 4;
if (CodeWordsMod)		if (CodeWordsMod)
CodeWords++;		CodeWords++;
uint32_t EpilogCount = info->EpilogMap.size();		uint32_t EpilogCount = info->EpilogMap.size();
bool ExtensionWord = EpilogCount > 31 \|\| TotalCodeBytes > 124;		// If we need to signal a larger EpilogCount or TotalCodeBytes, we need to
		// use the extension word, and leave those fields as zero in row1.
		// If both EpilogCount and CodeWords actually are zero though (e.g. due to
		// missing/incomplete .seh directives in assembly), we still need to include
		// the extension word, as that is how the reader will interpret it.
		bool ExtensionWord = EpilogCount > 31 \|\| TotalCodeBytes > 124 \|\|
		(EpilogCount == 0 && CodeWords == 0);
if (!ExtensionWord) {		if (!ExtensionWord) {
row1 \|= (EpilogCount & 0x1F) << 22;		row1 \|= (EpilogCount & 0x1F) << 22;
row1 \|= (CodeWords & 0x1F) << 27;		row1 \|= (CodeWords & 0x1F) << 27;
}		}
// E is always 0 right now, TODO: packed epilog setup		// E is always 0 right now, TODO: packed epilog setup
if (info->HandlesExceptions) // X		if (info->HandlesExceptions) // X
row1 \|= 1 << 20;		row1 \|= 1 << 20;
row1 \|= FuncLength & 0x3FFFF;		row1 \|= FuncLength & 0x3FFFF;
▲ Show 20 Lines • Show All 64 Lines • ▼ Show 20 Lines	streamer.emitValue(MCSymbolRefExpr::create(info->Symbol,
MCSymbolRefExpr::VK_COFF_IMGREL32,		MCSymbolRefExpr::VK_COFF_IMGREL32,
context),		context),
4);		4);
}		}

void llvm::Win64EH::ARM64UnwindEmitter::Emit(MCStreamer &Streamer) const {		void llvm::Win64EH::ARM64UnwindEmitter::Emit(MCStreamer &Streamer) const {
// Emit the unwind info structs first.		// Emit the unwind info structs first.
for (const auto &CFI : Streamer.getWinFrameInfos()) {		for (const auto &CFI : Streamer.getWinFrameInfos()) {
		WinEH::FrameInfo *Info = CFI.get();
		if (Info->empty())
		continue;
MCSection *XData = Streamer.getAssociatedXDataSection(CFI->TextSection);		MCSection *XData = Streamer.getAssociatedXDataSection(CFI->TextSection);
Streamer.SwitchSection(XData);		Streamer.SwitchSection(XData);
ARM64EmitUnwindInfo(Streamer, CFI.get());		ARM64EmitUnwindInfo(Streamer, Info);
}		}

// Now emit RUNTIME_FUNCTION entries.		// Now emit RUNTIME_FUNCTION entries.
for (const auto &CFI : Streamer.getWinFrameInfos()) {		for (const auto &CFI : Streamer.getWinFrameInfos()) {
		WinEH::FrameInfo *Info = CFI.get();
		// ARM64EmitUnwindInfo above clears the info struct, so we can't check
		// empty here. But if a Symbol is set, we should create the corresponding
		// pdata entry.
		if (!Info->Symbol)
		continue;
MCSection *PData = Streamer.getAssociatedPDataSection(CFI->TextSection);		MCSection *PData = Streamer.getAssociatedPDataSection(CFI->TextSection);
Streamer.SwitchSection(PData);		Streamer.SwitchSection(PData);
ARM64EmitRuntimeFunction(Streamer, CFI.get());		ARM64EmitRuntimeFunction(Streamer, Info);
}		}
}		}

void llvm::Win64EH::ARM64UnwindEmitter::EmitUnwindInfo(		void llvm::Win64EH::ARM64UnwindEmitter::EmitUnwindInfo(
MCStreamer &Streamer, WinEH::FrameInfo *info) const {		MCStreamer &Streamer, WinEH::FrameInfo *info) const {
// Switch sections (the static function above is meant to be called from		// Switch sections (the static function above is meant to be called from
// here and from Emit().		// here and from Emit().
MCSection *XData = Streamer.getAssociatedXDataSection(info->TextSection);		MCSection *XData = Streamer.getAssociatedXDataSection(info->TextSection);
Streamer.SwitchSection(XData);		Streamer.SwitchSection(XData);
ARM64EmitUnwindInfo(Streamer, info);		ARM64EmitUnwindInfo(Streamer, info);
}		}

llvm/test/MC/AArch64/seh.s

// This test checks that the SEH directives don't cause the assembler to fail.		// This test checks that the SEH directives don't cause the assembler to fail.
		// Checking that llvm-readobj doesn't bail out on the unwind data, but not
		// really checking the contents yet.

// RUN: llvm-mc -triple aarch64-pc-win32 -filetype=obj %s \| llvm-readobj -S -r - \| FileCheck %s		// RUN: llvm-mc -triple aarch64-pc-win32 -filetype=obj %s \| llvm-readobj -S -r -u - \| FileCheck %s

// CHECK: Sections [		// CHECK: Sections [
// CHECK: Section {		// CHECK: Section {
// CHECK: Name: .text		// CHECK: Name: .text
// CHECK: RelocationCount: 0		// CHECK: RelocationCount: 0
// CHECK: Characteristics [		// CHECK: Characteristics [
// CHECK-NEXT: ALIGN_4BYTES		// CHECK-NEXT: ALIGN_4BYTES
// CHECK-NEXT: CNT_CODE		// CHECK-NEXT: CNT_CODE
// CHECK-NEXT: MEM_EXECUTE		// CHECK-NEXT: MEM_EXECUTE
// CHECK-NEXT: MEM_READ		// CHECK-NEXT: MEM_READ
// CHECK-NEXT: ]		// CHECK-NEXT: ]
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK: Section {		// CHECK: Section {
// CHECK: Name: .xdata		// CHECK: Name: .xdata
// CHECK: RawDataSize: 20		// CHECK: RawDataSize: 32
// CHECK: RelocationCount: 1		// CHECK: RelocationCount: 2
// CHECK: Characteristics [		// CHECK: Characteristics [
// CHECK-NEXT: ALIGN_4BYTES		// CHECK-NEXT: ALIGN_4BYTES
// CHECK-NEXT: CNT_INITIALIZED_DATA		// CHECK-NEXT: CNT_INITIALIZED_DATA
// CHECK-NEXT: MEM_READ		// CHECK-NEXT: MEM_READ
// CHECK-NEXT: ]		// CHECK-NEXT: ]
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK: Section {		// CHECK: Section {
// CHECK: Name: .pdata		// CHECK: Name: .pdata
// CHECK: RelocationCount: 4		// CHECK: RelocationCount: 4
// CHECK: Characteristics [		// CHECK: Characteristics [
// CHECK-NEXT: ALIGN_4BYTES		// CHECK-NEXT: ALIGN_4BYTES
// CHECK-NEXT: CNT_INITIALIZED_DATA		// CHECK-NEXT: CNT_INITIALIZED_DATA
// CHECK-NEXT: MEM_READ		// CHECK-NEXT: MEM_READ
// CHECK-NEXT: ]		// CHECK-NEXT: ]
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: ]		// CHECK-NEXT: ]

// CHECK-NEXT: Relocations [		// CHECK-NEXT: Relocations [
// CHECK-NEXT: Section (4) .xdata {		// CHECK-NEXT: Section (4) .xdata {
// CHECK-NEXT: 0x8 IMAGE_REL_ARM64_ADDR32NB __C_specific_handler		// CHECK-NEXT: 0x8 IMAGE_REL_ARM64_ADDR32NB __C_specific_handler
		// CHECK-NEXT: 0x18 IMAGE_REL_ARM64_ADDR32NB __C_specific_handler
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: Section (5) .pdata {		// CHECK-NEXT: Section (5) .pdata {
// CHECK-NEXT: 0x0 IMAGE_REL_ARM64_ADDR32NB func		// CHECK-NEXT: 0x0 IMAGE_REL_ARM64_ADDR32NB func
// CHECK-NEXT: 0x4 IMAGE_REL_ARM64_ADDR32NB .xdata		// CHECK-NEXT: 0x4 IMAGE_REL_ARM64_ADDR32NB .xdata
// CHECK-NEXT: 0x8 IMAGE_REL_ARM64_ADDR32NB smallFunc		// CHECK-NEXT: 0x8 IMAGE_REL_ARM64_ADDR32NB handlerFunc
// CHECK-NEXT: 0xC IMAGE_REL_ARM64_ADDR32NB .xdata		// CHECK-NEXT: 0xC IMAGE_REL_ARM64_ADDR32NB .xdata
// CHECK-NEXT: }		// CHECK-NEXT: }
// CHECK-NEXT: ]		// CHECK-NEXT: ]


.text		.text
.globl func		.globl func
.def func		.def func
Show All 9 Lines	func:
.seh_handler __C_specific_handler, @except		.seh_handler __C_specific_handler, @except
.seh_handlerdata		.seh_handlerdata
.long 0		.long 0
.text		.text
add sp, sp, #24		add sp, sp, #24
ret		ret
.seh_endproc		.seh_endproc

// Test emission of small functions.		// Function with no .seh directives; no pdata/xdata entries are
		// generated.
.globl smallFunc		.globl smallFunc
.def smallFunc		.def smallFunc
.scl 2		.scl 2
.type 32		.type 32
.endef		.endef
.seh_proc smallFunc		.seh_proc smallFunc
smallFunc:		smallFunc:
ret		ret
.seh_endproc		.seh_endproc

		// Function with no .seh directives, but with .seh_handlerdata, which
		// forces generating the xdata/pdata entries.
		.globl handlerFunc
		.def handlerFunc
		.scl 2
		.type 32
		.endef
		.seh_proc handlerFunc
		handlerFunc:
		ret
		.seh_handler __C_specific_handler, @except
		.seh_handlerdata
		.long 0
		mstorsjoAuthorUnsubmitted Done Reply Inline Actions The `.long 0` here goes in the xdata section, appended after the unwind info. mstorsjo: The `.long 0` here goes in the xdata section, appended after the unwind info.
		.text
		.seh_endproc