This is an archive of the discontinued LLVM Phabricator instance.

Add ARM backend support for pagerando
Needs ReviewPublic

Authored by rinon on Sep 7 2017, 12:24 PM.

Details

Reviewers
javed.absar
Summary

Extend lowering to support POT-indirect address computation.

All calls between different bins (executable segments) must be lowered using
page offset table (POT) indirect address computation when pagerando is
enabled. This patch adds support to ARMISelLowering to support this
indirection. The ARMPagerandoOptimizer pass then optimizes intra-bin calls back
to direct calls, since references within the same bin are guaranteeed to have a
fixed PC-relative offset.

For example, the direct call to foo:

BL <ga:@foo>

will be selected as the following, assuming foo is placed in a randomly located
bin:

cp#0: foo(potoff)
cp#1: foo(binoff)

%foo_potoff = LDRi12 <cp#0>, 0
%foo_bin_addr = LDRrs %R9, %vreg1
%foo_binoff = LDRi12 <cp#1>, 0
%foo_addr = ADDrr %vreg2, %vreg3
BLX %foo_addr

Constant pool entry #0 is the offset into the POT table of the bin containing
foo, and entry #1 is the relative offset of foo from the beginning of that bin.
Together these constant pool entries allow us to index the POT and add an offset
to compute the dynamic address of foo.

Inside pagerando bins, global addresses not in the GOT are computed with
approximately the following instruction sequence:

cp#0: global(gotoff)

%got_addr = LDRi12 %R9, 0
%got_off_global = LDRi12 <cp#0>, 0
%global_addr = ADDrr %got_addr, %got_off_global

This sequence loads the address of the GOT into %got_addr from the first POT
entry. The constant pool entry containing the GOT-relative offset of @global is
then added to %got_addr to compute the dynamic address of @global. After this
sequence, %global_addr contains the dynamic address of @global.

Global addresses found in the GOT are loaded in the conventional way (using a
got_brel relocation on a constant pool entry), once the GOT address is loaded
from the first POT entry.

This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587)
is a first draft of the pagerando implementation described in
http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html.

Event Timeline

rinon created this revision.Sep 7 2017, 12:24 PM
rinon updated this revision to Diff 114244.Sep 7 2017, 12:57 PM

Remove old test for bug fix that has already been committed

rinon updated this revision to Diff 114744.Sep 11 2017, 6:28 PM
  • Fix style nits
rinon updated this revision to Diff 118517.Oct 10 2017, 5:48 PM

Handle function aliases when checking for pagerando targets

Function aliases should be treated as functions to determine if the target is in
a pagerando bin.

I'm mostly a linker person so I'm not best qualified to review the backend changes. I've tried to take a look from an overall toolchain perspective.

How does pagerando handle the Arm exception tables? I can't immediately see how they will work for binned functions. My understanding is that for each executable Section the compiler will output a .ARM.exidx section with a SHF_LINK_ORDER dependency on the executable Section. The format of the .ARM.exidx section is one entry per function | R_ARM_PREL31 offset to function (usually expressed as Section Symbol + offset) | inline unwinding instructions or R_ARM_PREL31 offset to .ARM.extab |. At link time after SHF_LINK_ORDER has been performed we are left with a single .ARM.exidx section containing a table of pc-relative offsets to functions that can be binary searched by the unwinder to find out how to unwind the stack when __cxa_throw is called. We obviously can't use pc-relative offsets to the binned functions (unless we are willing to make the .ARM.exidx section read/write and have the loader update the table). The pc-relative offset could be made to the wrapper functions (no .ARM.exidx section for the Section containing the binned functions, make sure each section containing wrappers has a .ARM.exidx section), but given that the call to cxa_throw would come from the binned function and not the wrapper the unwinder wouldn't be able to find the pc-value at the cxa_throw call site in the table unless the wrapper always tail-called.

My apologies if I'm missing something obvious. Can you add a test that involves exceptions if you have this covered already?

lib/Target/ARM/ARMAsmPrinter.cpp
891

What happens if LTO is not used? Is it just that only the current module gets put into bins and the rest isn't which is sub-optimal but correct or is it fatal? If it is fatal is there any way of asserting that LTO is required, if not then can the comment be expanded?

lib/Target/ARM/ARMConstantPoolValue.h
220

Typo "Mvodifier" should be "Modifier"

lib/Target/ARM/ARMFastISel.cpp
2968

style nit, I think that there is only one statement so the braces aren't needed.

2993

Would "Add the GOT address from POT[0]" be a bit clearer?

lib/Target/ARM/ARMISelLowering.cpp
2040

Given the assert above, could this be true?

rinon updated this revision to Diff 122019.Nov 7 2017, 4:35 PM
rinon marked 2 inline comments as done.
  • Fix nits
  • Clarify comments
  • Remove unnecessary PIP addressing check
rinon added a comment.Nov 7 2017, 4:35 PM

I'm mostly a linker person so I'm not best qualified to review the backend changes. I've tried to take a look from an overall toolchain perspective.

Thanks again!

How does pagerando handle the Arm exception tables? ...

Yes, pagerando disrupts the PREL relocations in the exidx tables. Our current solution is to extend the platform or C++ runtime libunwind (_Unwind_RaiseException) to canonicalize the IP back to its original offset in the file + the DSO base address before looking it up in the exidx tables. Depending on the platform and libunwind implementation, this information may be easily available in /proc/PID/maps. I believe the upstream libunwind actually does load maps if available. However, we probably can't rely on reading the maps file, since the LLVM libunwind does not use it.

The alternative to canonicalizing the IP would be, as you pointed out, to dynamically relocate the exception handling table, but that will add more load time and break the ARM spec, both of which seem unacceptable to me. Additionally, unwind libraries that read the ELF file from disk (without those relocations) will still be confused.

The following is a brief description of how we can use the POT to canonicalize addresses for unwinding without reading /proc/PID/maps. I should probably add this to the overall RFC, sorry for not covering this earlier.

In order to not rely on /proc/PID/maps, we must first find the POT entry for the current IP. To find the correct POT entry the unwinder must first find the POT for the current stack frame, which will always be located at the address in R9, assuming that unwinding of deeper stack frames was correct and the unwinder was able to retrieve R9 from its callee-saved stack slot. The unwinder can then iterate the POT and find the entry with the largest value <= IP, which should be the POT entry for the bin containing IP.

We can extract the required segment file offset information from the ELF program headers. As long as the linker preserves the ordering of the pagerando segments between the ELF file and their position in the POT, the unwinder can look up the segment corresponding to the POT index found in the previous step. This segment's program header gives us both the original file offset, as well as the segment size, allowing the unwinder to double check that IP indeed falls into that segment.

With a pagerando-randomized IP and the file offset of the corresponding ELF segment, the unwinder can canonicalize the IP to the address it would be located at if all pagerando pages had been laid out contiguously with the rest of the DSO. Using this canonical IP allows the exception handling table to be referenced normally, without any dynamic relocations.

lib/Target/ARM/ARMAsmPrinter.cpp
891

Calculating the static offset for a POT entry during code generation requires that the entire POT is laid out in the compiler, which requires LTO. We would need a new static relocation to refer to a POT entry if the POT was constructed after code generation. I'll add that to the comment here.

I don't know any good way to check for LTO at this point. POTOFF modifiers should only be emitted for functions marked with the pagerando attribute, and that is only done by a pass run during LTO. I'll look around to see if there might be a way to assert on this.

lib/Target/ARM/ARMFastISel.cpp
2968

Sorry about that. I've seen both styles for if followed by multi-line so I wasn't sure. Fixed now.

lib/Target/ARM/ARMISelLowering.cpp
2040

Given that we don't support pagerando for Windows, no, this can never be hit. Besides, we already handle PIP addressing in the UsePIPAddressing branch. I'll remove this.

rinon updated this revision to Diff 122378.Nov 9 2017, 5:32 PM

Properly handle NULL global value in call lowering

rinon updated this revision to Diff 130975.Jan 22 2018, 3:30 PM
rinon marked 2 inline comments as done.

Rebase

rinon edited the summary of this revision. (Show Details)Jan 22 2018, 3:34 PM
rinon edited the summary of this revision. (Show Details)Jan 23 2018, 2:32 PM
rinon updated this revision to Diff 131175.Jan 23 2018, 5:23 PM

Fix POT/bin offset references to aliases.

rinon updated this revision to Diff 131297.Jan 24 2018, 9:43 AM

Don't allow FastISel for libcalls from pagerando functions

rinon updated this revision to Diff 140718.Apr 2 2018, 5:43 PM
  • Don't tailcall from non-pagerando to pagerando functions
  • Use a heap-allocated array rather than a VLA
  • Rebase
hintonda removed a subscriber: hintonda.Apr 4 2018, 4:39 PM
rinon updated this revision to Diff 156645.Jul 20 2018, 5:01 PM

Rebase.

  • Update ARM tests