This is an archive of the discontinued LLVM Phabricator instance.

Generate SEH unwinding info on Win64
ClosedPublic

Authored by vadimcn on Jun 9 2014, 9:27 PM.

Details

Summary

This patch enables LLVM to emit Win64-native SEH unwind info rather than DWARF CFI. It handles all corner cases (I hope), including stack realignment.
Because SEH is not flexible enough to describe stack frames with a gap of unknown size in the middle, such as the one caused by stack realignment, I modified register spilling code to place all spills into the fixed frame slots, so that they can be accessed relative to the frame pointer.

Diff Detail

Repository
rL LLVM

Event Timeline

vadimcn updated this revision to Diff 10261.Jun 9 2014, 9:27 PM
vadimcn retitled this revision from to Generate SEH unwinding info on Win64.
vadimcn updated this object.
vadimcn edited the test plan for this revision. (Show Details)
vadimcn added subscribers: Unknown Object (MLST), Kai, sanjoy, mingwandroid.
vadimcn updated this revision to Diff 10277.Jun 10 2014, 8:06 AM

Whoops, looks like I got .cfi_offset and .cfi_rel_offset mixed up. The original offsets in bigstructret2.ll test were actually correct.
Also, rebased on top of master.

vadimcn updated this revision to Diff 10279.Jun 10 2014, 8:29 AM
rnk added inline comments.Jun 10 2014, 11:53 AM
include/llvm/CodeGen/MachineFrameInfo.h
486 ↗(On Diff #10279)

Let's try to avoid contiguous boolean parameters, especially when some have default arguments. They make it hard to do refactorings like removing the 'Immutable' parameter here.

Instead, CreateFixedSpillSlot(Size, SPOffset) sounds like the right interface to me.

lib/MC/MCAsmStreamer.cpp
1138 ↗(On Diff #10279)

This should be a separate revert. The existing test case is incorrect, it was doing '.seh_savereg %rsi, 16' and disassembling that with %rbp.

lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
148 ↗(On Diff #10279)

I think this means we will start attempting to emit cleanups for x86_64-pc-windows-msvc, which we probably aren't ready for. Can we hold off on this change, or are you reasonably sure it's safe?

lib/Target/X86/X86FrameLowering.cpp
402–410 ↗(On Diff #10279)

Are these comments accurate when we have an FP *and* the stack needs realignment? It seems like we should move the win64-specific XMM CSR saves to happen before the stack realignment. This should be safe since win64 also gives us 16-byte stack alignment.

738–742 ↗(On Diff #10279)

The comment below about "Don't care about the rest of the stack allocation" should get merged in here. I spent a while trying to understand why this scary comment is OK, and the comment below explains it.

744 ↗(On Diff #10279)

We have range based for loops now, so this can be:

for (const CalleeSavedInfo &Info : CSI) { ... }
773 ↗(On Diff #10279)

Format. You can drop the braces.

910 ↗(On Diff #10279)

Seems like a superfluous formatting change

1092 ↗(On Diff #10279)

:(

Is TargetFrameLowering::getCalleeSavedSpillSlots() not powerful enough to express the offsets you need? If it isn't, I think the right way forward is to add a more flexible interface to TFL, like TFL::appendCalleeSavedSpillSlot(std::vector<CSI> &CSIs), where the default implementation forwards on to check 'TFL->getCalleeSavedSpillSlots(NumFixedSpillSlots)' so existing targets don't require changes.

lib/Target/X86/X86ISelLowering.cpp
600–603 ↗(On Diff #10279)

Can we flip this to the positive sense? What targets actually need it? win32 only?

vadimcn added inline comments.Jun 10 2014, 12:42 PM
include/llvm/CodeGen/MachineFrameInfo.h
486 ↗(On Diff #10279)

Will do.

lib/MC/MCAsmStreamer.cpp
1138 ↗(On Diff #10279)

I have it as a separate commit in git, it just got squashed when posting diff to Phabricator. Do you want me to actually submit it as a separate review?

lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
148 ↗(On Diff #10279)

Not sure I understand... Can you please elaborate?

lib/Target/X86/X86FrameLowering.cpp
402–410 ↗(On Diff #10279)

Stack realignment happens before stack pointer decrement, so XMM slots would be be in "invalid" zone at that point. We'd have to swap re-alignment and SP decrement in order to do what you suggest. I did not want to mess with existing frame setup order too much...
But I don't see the problem with the way it is now, because (re-alignment + SP decrement) is guaranteed to allocate at least as much space as SP decrement alone. And the xmm spill slots are accessed via rbp, which is set up before stack realignment, so no problem there either.

738–742 ↗(On Diff #10279)

okay

910 ↗(On Diff #10279)

okay, will undo

1092 ↗(On Diff #10279)

The problem with getCalleeSavedSpillSlots() is that it would reserve spill slots regardless of whether the register actually needs to be spilled, which would increase stack usage unnecessarily.

I was thinking of using hasReservedSpillSlot() for this purpose, because, in principle, it could create spill slots on-the-fly, but then we'd be relying on the order of register enumeration. Also, it'd need to store state somewhere, and the code overall would be more convoluted.

How about changing getCalleeSavedSpillSlots() signature to take non-const vector<CalleeSavedInfo>?

If that's too hacky, I'd propose to create TargetFrameLowering::assignSpillSlots(vector<CSI>& ) callback, that would allow to override PEI::calculateCalleeSavedRegisters() in a target-specific manner.

rnk added inline comments.Jun 10 2014, 1:03 PM
lib/MC/MCAsmStreamer.cpp
1138 ↗(On Diff #10279)

No, I can commit it.

lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
148 ↗(On Diff #10279)

Ignore this comment. I worry this will break 'clang-cl -m64' because we emit 'invoke' instructions in the frontend to run C++ destructors. Clang is currently relying on the fact that LLVM drops the landing pads on the floor and generates regular calls. Clang shouldn't be relying on that, though.

lib/Target/X86/X86FrameLowering.cpp
402–410 ↗(On Diff #10279)

OK, that makes sense. We may end up spilling xmm registers into the stack realignment gap, but that should be OK. Can you add a comment something like:

; It's OK if the stack was realigned and xmm registers are spilled into the realignment gap.
779 ↗(On Diff #10279)

This can also be range-based.

1092 ↗(On Diff #10279)

If that's too hacky, I'd propose to create TargetFrameLowering::assignSpillSlots(vector<CSI>& ) callback, that would allow to override PEI::calculateCalleeSavedRegisters() in a target-specific manner.

This is basically what I was trying to suggest. The default implementation should do what calculateCalleeSavedRegisters() does now, so you don't have to change every target.

compnerd added inline comments.
lib/MC/MCObjectFileInfo.cpp
635 ↗(On Diff #10279)

AFAIK, the COFF backend only supports Windows ATM. Why not make this an assert?

assert(T.isOSWindows() && "Windows is the only supported COFF target");
if (T.getArch() != Triple::x86_64) {
  LSDASection = ...
}
lib/Target/X86/MCTargetDesc/X86MCAsmInfo.cpp
168 ↗(On Diff #10279)

Convert to an assert and simplify?

I went and tried this out but ran into an assertion. This is odd because at that stage I wasn't actually using COFF, but rather ELF in memory for MCJIT. Maybe there's a check missing whether we are using COFF in addition to Win64? See below:

Program: C:\mingw-builds\msys64\home\kfischer\julia\usr\bin\julia.exe
File: C:/mingw-builds/msys64/home/kfischer/julia/deps/llvm-svn/lib/CodeGen/MachineInstr.cpp, Line 674

Expression: (isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"
In D4081#16, @loladiro wrote:

I went and tried this out but ran into an assertion. This is odd because at that stage I wasn't actually using COFF, but rather ELF in memory for MCJIT. Maybe there's a check missing whether we are using COFF in addition to Win64? See below:

Program: C:\mingw-builds\msys64\home\kfischer\julia\usr\bin\julia.exe
File: C:/mingw-builds/msys64/home/kfischer/julia/deps/llvm-svn/lib/CodeGen/MachineInstr.cpp, Line 674

Expression: (isImpReg || Op.isRegMask() || MCID->isVariadic() || OpNo < MCID->getNumOperands() || isMetaDataOp) && "Trying to add an operand to a machine instr that is already done!"

Did you do a clean build? I ran into something similar after adding SEH instructions, and the solution was to delete all .o and .inc files under Target\X86. Apparently make dependencies are not quite reliable.

I did not. Doing that now. Will report back. The make dependencies should really be figured out though.

Ok, now I'm getting "LLVM ERROR: No open Win64 EH frame function!" (again with ELF in the first stage).

Have you run the example at http://llvm.org/bugs/show_bug.cgi?id=19905 through llvm with this change in?

lib/Target/X86/X86FrameLowering.cpp
1131 ↗(On Diff #10279)

One nice effect of having a "TargetFrameLowering::assignSpillSlots" is that you won't have to create frame indices only to delete them i.e. you'd have CSR's spilled to fixed slots to begin with.

vadimcn added inline comments.Jun 13 2014, 6:18 PM
lib/Target/X86/X86ISelLowering.cpp
600–603 ↗(On Diff #10279)

This is to make sure that landing pads do get emitted for win64. Honestly, I have no idea what the inverse of this set looks like.

vadimcn updated this revision to Diff 10416.Jun 13 2014, 6:28 PM

Addressed review comments.

logan added a subscriber: logan.Jun 16 2014, 7:25 AM
rnk accepted this revision.Jun 18 2014, 1:07 PM
rnk edited edge metadata.

lgtm, do you want me to commit this?

lib/Target/X86/X86FrameLowering.cpp
1084 ↗(On Diff #10416)

indentation, clang-format can fix it.

1113 ↗(On Diff #10416)

s/GRPs/GPRs/

This revision is now accepted and ready to land.Jun 18 2014, 1:07 PM
vadimcn added inline comments.Jun 18 2014, 1:39 PM
lib/Target/X86/X86FrameLowering.cpp
1084 ↗(On Diff #10416)

Do you want me to run clang-format on all files I've changed?
I am a bit worried about doing that, because it reformats quite a bit more code beyond what I touched.

You can run clang-format from the VC IDE on selected (marked) text only.However, it may be a good idea to reformat all the file if it does not conform to clang format.

Reid, what do you think?

rnk added inline comments.Jun 18 2014, 2:04 PM
lib/Target/X86/X86FrameLowering.cpp
1084 ↗(On Diff #10416)

No, typically it's used interactively only on lines that are being changed. See git-clang-format. Anyway, I can do that if this is done.

Yes, please commit, if you don't mind running format yourself.

rnk added a comment.Jun 19 2014, 2:23 PM

I tried to commit today, but there were conflicts while rebasing. I attempted to resolve them, but I still had this test failure after that:

test/CodeGen/X86/x86-64-static-relo-movl.ll

Can you look into this? The test uses a x86_64-pc-win32-macho triple, which makes us hit "LLVM ERROR: No open Win64 EH frame function!". We probably shouldn't emit win64 style unwind info when generating MachO object files.

Is 'x86_64-pc-win32-macho' a valid target? Windows 64 ABI in OSX object file seems like a weird combination.

Not necessarily by itself, but it's not uncommon to have macho or elf in memory for MCJIT (as I mentioned above). Therefore, every effort should be made to make sure it doesn't break.

vadimcn updated this revision to Diff 10670.Jun 19 2014, 6:25 PM
vadimcn updated this object.
vadimcn edited edge metadata.

Fixed x86-64-static-relo-movl.ll and rebased.

rnk closed this revision.Jun 20 2014, 1:44 PM
rnk updated this revision to Diff 10704.

Closed by commit rL211399 (authored by @rnk).

Twobit added a subscriber: Twobit.Jun 22 2014, 9:59 PM

Looks like something wrong with .pdata segment. It contains Function start address == Function end address and not always sorted by start address.

In D4081#42, @Twobit wrote:

Looks like something wrong with .pdata segment. It contains Function start address == Function end address and not always sorted by start address.

@Twobit, do you have an example?

Yes, of course. I trying to compile main.ll and get main.obj. llvm toolchain is builded from " git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@211430 "


Hmm, something must be wrong at the MC layer. When I assemble the output of llc with gnu assembler (after replacing register numbers with names in seh directives), it seems to produce a correct .obj.

rnk added a comment.Jun 23 2014, 1:39 PM

This got reverted in r211480, you should probably look into it.

@chapuni, Which tests broke? Is there a PR?
I saw that MultiJitTest.JitPool fails, but it was failing before my change too, so I ignored it.

chapuni edited edge metadata.Jun 23 2014, 2:54 PM

I met entire JITTests.exe and lli.exe (and clang-interpreter) got broken, not only MultiJitTest.JitPool.

Command 0 Stderr:
pseudo instructions should be removed before code emission

UNREACHABLE executed at C:\bb-win7\msbuild-llvmclang-x64-msc17-DA\llvm-project\llvm\lib\Target\X86\X86CodeEmitter.cpp:1106!

FYI,
mingw x64 Release+Asserts http://bb.pgr.jp/builders/ninja-clang-x64-mingw64-RA/builds/3572
msc17 x64 Debug+Asserts http://bb.pgr.jp/builders/msbuild-llvmclang-x64-msc17-DA/builds/731

I could help investigating if you couldn't reproduce failures.

I was wrong with example above. Looks like the only problem is sorting and sometimes overlapping ranges of Function start & end. For example, in cases when exist two or more .text sections there is no way at all to sort function starts (in one .pdata section)
I think, solution is to create one .pdata (and maybe .xdata) for each exception info as MS VC compiler do

rnk added a comment.Jun 25 2014, 1:12 PM

This was relanded in r211691, so everything is good to go.

Twobit added a comment.Jul 3 2014, 8:45 AM

My small patch (can be applied with git or by hand) with workaround some of "invalid .pdata contributions" issues. It creates new pdata and xdata section for every function

rnk added a comment.Jul 3 2014, 8:05 PM

Where is the patch? I don't see it in my inbox. That sounds like the
correct solution: the individual .pdata sections should be comdat
associative with the individual .text sections.

Twobit added a comment.EditedJul 3 2014, 9:16 PM

Patch was attached to phabricator (
http://reviews.llvm.org/file/data/h36ma4imbeunwhbu56fp/PHID-FILE-2w25ko3anof4isbyyzka/pdata_patch.patch
). Also I attach it to this email.

the individual .pdata sections should be comdat associative with the individual .text sections.

Maybe yes or maybe not. Having only one element in .pdata totally eliminate
sort routine.
Anyway this is a temporary solution. I hope someone of llvm maintainers
reimplement this patch as it should

2014-07-04 9:05 GMT+06:00 Reid Kleckner <rnk@google.com>:

Where is the patch? I don't see it in my inbox. That sounds like the
correct solution: the individual .pdata sections should be comdat
associative with the individual .text sections.

http://reviews.llvm.org/D4081

Kai added a comment.Jul 12 2014, 3:46 PM

The patch solves the problem but need to be updated to latest LLVM changes.