This is an archive of the discontinued LLVM Phabricator instance.

AArch64: make sure jump table entries can reach entire image
ClosedPublic

Authored by t.p.northover on Sep 8 2020, 6:33 AM.

Details

Reviewers
efriedma
Summary

This patch is addressing two issues:

  1. https://bugs.llvm.org/show_bug.cgi?id=47124. A weak function containing a jump table causes a linker warning on Darwin because the entries (LBB - lJTI) refer back to the function and can't be overridden.
  2. The AArch64 "small" code model that we default to requires that all statically allocated code and data is within a 4GB window, somewhere in memory. Unfortunately our jump table entries are 32-bit signed constants which only have a 2GB reach, and they're located in a different section (to allow for things like execute-only memory).

Without that second issue I'd prefer to silence the linker warning somehow, and without the first I'd be tempted to let sleeping dogs lie, but together I think they make a compelling enough case to change how we do jump tables. Currently we emit something like

    adrp x0, LJTI0_0@PAGE
    add x0, x0, LJTI0_0@PAGEOFF
    ldrsw x1, [x0, x2, lsl #2]
    add x0, x0, x1
    br x0
    [...]
LJTI0_0:
    .word LBB0_0 - LJTI0_0

If we can no longer use LJTI0_0 as the base we need something within the function that has to be materialized separately. So this changes the sequence to be

    adrp x0, LJTI0_0@PAGE
    add x0, x0, LJTI0_0@PAGEOFF
Ltmp0:
    adr x1, Ltmp0
    ldrsw x2, [x0, x2, lsl #2]
    add x1, x1, x2, lsl #2
    br x1
[...]
LJTI0_0:
    .word (LBB0_0 - Ltmp0)>>2

Diff Detail

Event Timeline

t.p.northover created this revision.Sep 8 2020, 6:33 AM
t.p.northover requested review of this revision.Sep 8 2020, 6:34 AM

Decided shifting 32-bit entries was pointless and possibly suboptimal, so skip it. Also noticed that the resize could truncate the SmallVector so guarded against that and added test.

Given that most jump tables are probably compressed anyway, I don't expect any big performance impact. But it would be nice to have some numbers.

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
858

For 32-bit, we sign-extend? I guess this is assuming a function can't be larger than 2GB? (I think you could avoid that assumption by using the compression infrastructure for 32-bit words.)

efriedma accepted this revision.Sep 10 2020, 4:52 PM

LGTM, but I still would like to see some numbers before merge.

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
858

On second thought, you're likely to run into other issues well before then, so probably not important.

This revision is now accepted and ready to land.Sep 10 2020, 4:52 PM

Thanks Eli, I'll get back to you with numbers.

I've also discovered (well, Ahmed did) that it's not robust in the face of tail duplication. The first adr would steal and emit the label, which might not be in range of the duplicated adr. The fix is easy enough (isNotDuplicable in TableGen) but I still need to write a nice tempting IR file for the test.

llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp
858

Yes, I think the function limit is probably well below 2GB already.

Instead of trying to make the table relative to the jump/adr, can we use the approach we use for compressed jump tables (using the address of the first destination as the base) unconditionally? Seems less complicated overall. I guess making it work in general costs one extra instruction in general: if you want to handle functions larger than the range of adr, you need to use adrp instead of adr. But functions that large are rare, and the potential upside from duplicating jumptable jumps is pretty large.

Instead of trying to make the table relative to the jump/adr, can we use the approach we use for compressed jump tables (using the address of the first destination as the base) unconditionally? Seems less complicated overall.

As you point out, adr may not have enough range in that case so it would have to be an adrp system. Further, because of MachO limitations on relocation addends (have I said how much I hate that format in the last week? Time to do it again), it can't be anything inside the function, leaving only the function symbol itself.

All in all, I was actually quite happy with this patch on the complexity front. It unifies the three JumpTableDest paths, squeaking in at removing code on average (+50/-62 excluding tests). I don't think the same would be true for a split adrp/adr solution, even if we don't try to optimize and produce adr when possible.

  • Mark non-duplicable
  • Fix incorrect immediate for ldrsw shift component.

Sorry about the delay in getting back to you with benchmarks, it took me a while to get LNT running reliably again. I don't think there was anything really significant:

CFP2006/453.povray/453.povray.exec56.555.7-1.423%
CINT2006/473.astar/473.astar.exec391.1389.8-0.326%
CFP2006/470.lbm/470.lbm.exec241.0240.7-0.112%
CINT2006/429.mcf/429.mcf.exec85.485.3-0.06909%
CINT2006/483.xalancbmk/483.xalancbmk.exec1271.81271.0-0.06039%
CFP2006/433.milc/433.milc.exec169.8169.7-0.01231%
CINT2006/464.h264ref/464.h264ref.exec594.7594.7+0.001244%
CFP2006/444.namd/444.namd.exec144.9144.9+0.001587%
CINT2006/456.hmmer/456.hmmer.exec412.0412.1+0.01723%
CINT2006/401.bzip2/401.bzip2.exec336.0336.2+0.06515%
CFP2006/447.dealII/447.dealII.exec568.0568.6+0.1038%
CINT2006/458.sjeng/458.sjeng.exec714.1715.4+0.1882%
CINT2006/403.gcc/403.gcc.exec5.25.2+0.2582%
CINT2006/445.gobmk/445.gobmk.exec439.2440.4+0.2823%
CINT2006/462.libquantum/462.libquantum.exec10.110.1+0.3071%
CFP2006/482.sphinx3/482.sphinx3.exec61.661.8+0.3138%
CFP2006/450.soplex/450.soplex.exec36.736.8+0.3174%
CINT2006/400.perlbench/400.perlbench.exec139.1139.6+0.4139%
CINT2006/471.omnetpp/471.omnetpp.exec527.3530.6+0.6203%

Fix to win64-jumptable.ll hadn't been committed.

Further, because of MachO limitations on relocation addends (have I said how much I hate that format in the last week? Time to do it again), it can't be anything inside the function, leaving only the function symbol itself.

LLVM seems to accept "adrp x8, (f+1)@PAGE"; does that not do the right thing?

I'm a little concerned about the potential performance impact of the isNotDuplicable marking (that's the one change that impacts compressed jump tables), but your results seem to indicate it isn't significant.

efriedma accepted this revision.Sep 17 2020, 3:52 PM

Further, because of MachO limitations on relocation addends (have I said how much I hate that format in the last week? Time to do it again), it can't be anything inside the function, leaving only the function symbol itself.

LLVM seems to accept "adrp x8, (f+1)@PAGE"; does that not do the right thing?

Oh, that has limited range; I guess it doesn't help.

In that case, LGTM the way it is, I guess.

t.p.northover closed this revision.Sep 18 2020, 1:54 AM

I'm a little concerned about the potential performance impact of the isNotDuplicable marking (that's the one change that impacts compressed jump tables), but your results seem to indicate it isn't significant.

I actually suspect that might be positive, or at least not as clear cut. The duplication code is pretty sensitive about the size of the sequence it duplicates, and doesn't know this expands to 3 real instructions (to even get the test working on simple code I had to increase the threshhold via a dev command-line option).

Oh, that has limited range; I guess it doesn't help.

Yep, that's actually what started this whole saga (someone was obfuscating their code and ended up with a huge function with broken jump tables).

Anyway, thanks for the review. It's committed now as 2afe4becec7.