This is an archive of the discontinued LLVM Phabricator instance.

[x86_64] Don't truncate PC-relative relocations in ELF EH frames to 32 bits
ClosedPublic

Authored by akiss on Nov 3 2014, 12:44 AM.

Details

Summary

When using LLVM in WebKit's FTL JIT (on x86_64/Linux/ELF), WebKit crashes with errors like:

ASSERTION FAILED: cfiLength
../../Source/JavaScriptCore/ftl/FTLUnwindInfo.cpp(430) : void JSC::FTL::{anonymous}::findFDE(uintptr_t, uintptr_t, uint32_t, JSC::FTL::{anonymous}::FDE_Info*, JSC::FTL::{anonymous}::CIE_Info*)
1 0x7f5bd4ceabf4 /home/akiss/devel/WebKit/WebKitBuild/Debug/lib/libjavascriptcoregtk-4.0.so.18(WTFCrash+0x1e) [0x7f5bd4ceabf4]
2 0x7f5bd4c94b05 /home/akiss/devel/WebKit/WebKitBuild/Debug/lib/libjavascriptcoregtk-4.0.so.18(+0x12ceb05) [0x7f5bd4c94b05]
3 0x7f5bd4c952b3 /home/akiss/devel/WebKit/WebKitBuild/Debug/lib/libjavascriptcoregtk-4.0.so.18(_ZN3JSC3FTL10UnwindInfo5parseEPvmPFlPNS_9ExecStateEE+0xa9) [0x7f5bd4c952b3]

The root cause of the errors turns out to be that the addresses in the Frame Description Entries (FDEs) of the EH frames are encoded in 32-bit PC-relative format. However, quite often the EH frames can get far away in the memory from the associated code and so the correct relative addresses cannot be computed/stored in 32 bits.

This patch defines the address encoding on x86_64 to be dwarf::DW_EH_PE_pcrel, which is 8 bytes there (instead of dwarf::DW_EH_PE_pcrel | dwarf::DW_EH_PE_sdata4, which is explicitly 4 bytes) so that the relative addresses don't get truncated.

Diff Detail

Repository
rL LLVM

Event Timeline

akiss updated this revision to Diff 15685.Nov 3 2014, 12:44 AM
akiss retitled this revision from to [x86_64] Don't truncate PC-relative relocations in ELF EH frames to 32 bits.
akiss updated this object.
akiss edited the test plan for this revision. (Show Details)
akiss added a reviewer: joerg.
akiss added a subscriber: Unknown Object (MLST).
joerg edited edge metadata.Nov 3 2014, 3:59 AM

I don't exactly understand what your constraints are. Do you map text and data segments at completely separate locations? For a normal (non-JITted) binary, all segments are more or less consecutive, so differences are always less than 32bit signed for binaries smaller than 2GB. I do not want to change the default as it will increase the encoding for everyone.

WebKit does JIT and yes, the sections get quite far away from each other, most of the time. The reason for this is that there are different allocators for the code and data sections.
http://trac.webkit.org/browser/trunk/Source/JavaScriptCore/ftl/FTLCompile.cpp?rev=173509#L53

I was "encouraged" to change the encoding to 64 bits by the mips64/mips64el solution.

lhames added a subscriber: lhames.Nov 3 2014, 3:59 PM

We can't turn this on my default (as in this patch), but based on discussion with Nick Kledzik I think it should probably be turned on for the large code model (which I believe FTL uses).

Do you know how this is being handled on Darwin? From a glance at InitMachOMCObjectFileInfo I'd have expected you to be seeing the same thing there?

echristo edited edge metadata.Nov 3 2014, 4:00 PM

I'm not sure this is correct, please hold off on committing.

Thanks.

-eric

joerg added a comment.Nov 3 2014, 4:54 PM

If you copy the condition from PersonalityEncoding, it should be fine.

Since we always have an augmentation that contains "R" the value of the initial_location field in the FDE should be encoded as the eh_frame_hdr section which, as far as I can tell, is pcrel | sdata4 irrespective of code model. (Also, the dwarf bits for mips for FDECFIEncoding look weird and are likely broken).

The data itself should be encoded as the PE pointer that Joerg mentioned and it should be as we've got it.

Yes this seems wonky.

What's going on in practice is that gcc (our canonical reference for svr4 amd64 abi ... for better or worse) is emitting using the encoding of the rest of the fde section here which is going to be the same as the personality encoding and the linker creates a header that says something else.

I think we're going to be ok with using the personality routine encoding here. However, I'd prefer this code be rewritten to use just an fde encoding, personality encoding, and lsda encoding rather than having a 4th encoding that just copies another one.

All of this aside, the JIT could also use static code generation and then just use absptr for the encodings as well :)

-eric

akiss updated this revision to Diff 15794.Nov 4 2014, 4:28 PM
akiss edited edge metadata.

Thanks for the feedbacks. Patch updated as requested by Eric.

ping
(as advised on mailing list)

Dear Eric & Joerg,

Have you had a chance to review the updated patch, by chance?

Thanks,
Akos

joerg added inline comments.Nov 18 2014, 5:34 AM
lib/MC/MCObjectFileInfo.cpp
278 ↗(On Diff #15794)

Why do we need the DW_EH_PE_indirect in the case of Small/Medium?

Thinking again, I think it should be:

Small/Medium: pcrel + sdata4
otherwise: pcrel + sdata8

akiss updated this revision to Diff 16346.Nov 18 2014, 11:16 AM

The original patch did not have DW_EH_PE_indirect either, so removed it now, as Joerg suggested.

joerg accepted this revision.Nov 21 2014, 2:39 AM
joerg edited edge metadata.
This revision is now accepted and ready to land.Nov 21 2014, 2:39 AM

Thanks for the accept. Since I have no commit rights, may I ask you to land it for me?

joerg closed this revision.Nov 21 2014, 6:43 AM
joerg updated this revision to Diff 16490.

Closed by commit rL222538 (authored by @joerg).

pitrou added a subscriber: pitrou.Mar 19 2015, 11:30 AM

Hello,

I see this issue has generated several changesets. The latest one seems to be rL222897, where only the Large code model avoids truncation. Am I correct in understanding that the JITDefault code model still generates truncations?

At Numba we've maintained a more encompassing patch and it allowed us to eliminate occasional crashes on x86-64 (we use the JITDefault code model):
https://github.com/sklam/llvm/commit/9c8d6d60a18f9a8c113cd4d490efe6f9de1f540a

Also there seem (?) to be sibling issues:
https://llvm.org/bugs/show_bug.cgi?id=21423
https://llvm.org/bugs/show_bug.cgi?id=15356

joerg added a comment.Mar 19 2015, 3:51 PM

The logic now should be "use 32bit iff small or medium code model is set".