Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Details
- Reviewers
leonardchan mcgrathr phosek hans lebedev.ri - Commits
- rGe96df3e531f5: [Passes] Add relative lookup table converter pass
rG5178ffc7cf92: [Passes] Add relative lookup table converter pass
rG5fd001a5ffba: [Passes] Add relative lookup table converter pass
rG78a65cd945d0: [Passes] Add relative lookup table converter pass
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
One thing that just occurred to me: do we also perhaps want to hide this behind a flag? Right now it's being added to various default optimization pipelines, so some users might be surprised if they suddenly see their lookup tables change (either compiler or user generated). Do we want to make this something more opt-in, or is the layout itself not necessarily a concern to most users?
clang/test/CodeGen/switch-to-lookup-table.c | ||
---|---|---|
2 ↗ | (On Diff #318372) | +1 on this. Unless this functionally changes something in the clang codebase, this test shouldn't be here. As hans pointed out, setting the -relocation-model should be enough. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
384 | Sorry, I think you might have explained this offline, but what was the reason for needing this in TTI? I would've though this information could be found in the Module (PIC/no PIC, 64-bit or not, code model). If it turns out all of this is available in Module, then we could greatly simplify some logic here by just checking this at the start of the pass run. If TTI is needed, then perhaps it may be better to just inline all these checks in convertToRelativeLookupTables since this is the only place this is called. I think we would only want to keep this here as a virtual method if we plan to have multiple TTI-impls overriding this. | |
llvm/lib/Transforms/IPO/PassManagerBuilder.cpp | ||
880 | Should this be added at the end of the pipeline? It could be possible that other passes insert lookup tables but they may be untouched by this if this pass runs before them. | |
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
31–32 | Nit: // | |
41 | isa | |
54–71 | I think I mentioned this in a previous round of comments, but what you probably want here is just a way to determine if the operand is either a global or some constant offset from global. Right now it looks this this will ignore other constant expressions like bitcasts or ptrtoints which we also want to catch. I think it might be better to use IsConstantOffsetFromGlobal which already handles these cases. | |
80 | cast | |
219 | Should we be returning the value returned by this? |
- Used IsConstantOffsetFromGlobal() function
- Added this pass to the end of legacy pass manager pipeline
- Moved all the tests under a new test directory
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
384 | Code model or PIC/noPIC is only set in the module if the user explicitly specifies them. I think the checks cannot be moved into the pass because it uses the TargetMachine which is only available/visible in the TTI implementation itself. |
LGTM pending a few more comments. Should also give some time to let others respond if they have feedback.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
41–42 | Not needed here since you have the isa below. | |
llvm/test/Transforms/RelLookupTableConverter/relative_lookup_table.ll | ||
2–3 | We should also check some other RUNs to check that this isn't run on cases that return false in shouldBuildRelLookupTables: non-PIC, non-64-bit, other code model sizes, etc. | |
llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll | ||
39 | It looks like this test case isn't much different from string_table in relative_lookup_table.ll? If so, then this file could be removed. | |
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn | ||
64 | Good that you added this, but I think Nico has a bot that automatically updates these BUILD.gn files so manually updating them may not be necessary. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
390–391 |
|
Sorry for being unresponsive for a while, I got distracted by various bugs.
I skimmed this and it's looking great. Just added a few nit picks.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
32 | It would be better if the comment said why. I suppose the reason is we need to be sure there aren't other uses of the table, because then it can't be replaced. But it would be cool if a future version of this patch could handle when there are multiple loads from the table which can all be replaced -- for example this could happen if a function which uses a lookup table gets inlined into multiple places. | |
47 | It would be good with a "why" here too. |
- Add no_relative_lookup_table.ll test case that checks the cases where relative lookup table should not be generated
- Add comments about dso_local check and single use check
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
32 | I actually ran into the exact case that you described during testing, where a function that uses a switch gets inlined into multiple call sites :) | |
llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll | ||
39 | I renamed this test case to no_relative_lookup_table.ll that checks the cases where relative lookup table should not be generated like in non-pic mode, medium or large code models, and 32 bit architectures, etc. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
390–391 |
I thought that it makes sense to enable this optimization by default on all the targets that can support it. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
390–391 | I'm sorry, i do not understand. |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
390–391 | isArch64Bit checks whether we have a 64-bit architecture, right? |
Thank you!
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
390–391 |
D'oh. I really did read it as A*A*rch64 :/ Sorry. | |
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
173–174 | make_early_inc_range()? | |
201 | I would think this should be PreservedAnalyses PA; PA.preserveSet<CFGAnalyses>(); return PA; since this doesn't touch CFG at all. |
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn | ||
---|---|---|
64 | Please don't touch gn files unless you use them. Simple file additions in cmake files are synced automatically http://github.com/llvmgnsyncbot You forgot to add the trailing comma and now I had to fix it up manually instead of doing nothing :P |
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn | ||
---|---|---|
64 | Sorry about that Niko. I can fix it, so you don't need to do anything. |
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
92 | This should use the same address space as the original global. |
It looks like this is breaking the Windows/ARM(64) target - it doesn't produce the right relative relocations for symbol differences. It can be reproduced with a testcase like this:
$ cat test.s .text func1: ret func2: ret .section .rdata,"dr" .p2align 2 table: .long func1-table .long func2-table $ clang -target aarch64-windows -c -o - test.s | llvm-objdump -r -s - <stdin>: file format coff-arm64 RELOCATION RECORDS FOR [.rdata]: OFFSET TYPE VALUE 0000000000000000 IMAGE_REL_ARM64_ADDR32 func1 0000000000000004 IMAGE_REL_ARM64_ADDR32 func2 Contents of section .text: 0000 c0035fd6 c0035fd6 .._..._. Contents of section .rdata: 0000 00000000 04000000 ........
Those relocations would need to be IMAGE_REL_ARM64_REL32. It looks like the arm/windows target has got the same issue as well.
Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?
It turned out to not be all that hard to fix actually, see D99572 for such a fix. If I can get that landed soon, I think we might not need to act on this one.
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | ||
---|---|---|
322–323 | Putting a ModulePass in the middle of the CodeGen pass pipeline creates a "pass barrier": now instead of applying every pass to each function in turn, the old pass manager will stop, run this whole-module pass, and then run subseqeunt passes in the next function pass manager on each function in turn. This isn't ideal. @aeubanks, can you follow-up to make sure this is addressed? We had the same issues with the SymbolRewriter pass, which if you grep for "Rewrite Symbols" you can see has the same issue. I remember writing a patch to fix it, but I guess I never landed it. |
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | ||
---|---|---|
322–323 | I see "Rewrite Symbols" in the codegen pipeline and yeah it's splitting the function pass manager. For this patch, can we just not add the pass to the legacy PM pipeline? It's deprecated and the new PM is already the default for the optimization pipeline. |
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | ||
---|---|---|
322–323 | (https://reviews.llvm.org/D99707 for anybody interested) |
Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?
I will disable the pass for those targets for now.
When the issue is resolved, I would like to enable it for those targets as well.
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll | ||
---|---|---|
322–323 |
FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).
FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).
Thank you @mstorsjo!
This patch breaks a two stage build with LTO:
$ git bisect log # bad: [f0bc2782f281ca05221d2f1735bbaff6c4b81ebb] [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth # good: [9829f5e6b1bca9b61efc629770d28bb9014dec45] [CVP] @llvm.[us]{min,max}() intrinsics handling git bisect start 'f0bc2782f281ca05221d2f1735bbaff6c4b81ebb' '9829f5e6b1bca9b61efc629770d28bb9014dec45' # bad: [1af35e77f4b8c3314dc20a10d579b52f22c75a00] [TTI] NFC: Change getVectorInstrCost to return InstructionCost git bisect bad 1af35e77f4b8c3314dc20a10d579b52f22c75a00 # bad: [45f8946a759a780e6131256d6d206977b9c128ee] [CodeView] Fix the ARM64 CPUType enum git bisect bad 45f8946a759a780e6131256d6d206977b9c128ee # good: [0b439e4cc9dbb5c226121383b84d4f48ab669c55] [libc++] Split std::allocator out of <memory> git bisect good 0b439e4cc9dbb5c226121383b84d4f48ab669c55 # good: [2eb98d89ac866e32cb56727174e4d1c1413479c8] [mlir][spirv] Allow bitwidth emulation on runtime arrays git bisect good 2eb98d89ac866e32cb56727174e4d1c1413479c8 # bad: [80aa9b0f7b3ebe53220a398b2939610d8a49e24b] [PowerPC] stop reverse mem op generation for some cases. git bisect bad 80aa9b0f7b3ebe53220a398b2939610d8a49e24b # good: [a8ab1f98d22cf15f39dd1c2ce77675e628fceb31] [Evaluator] Look through invariant.group intrinsics git bisect good a8ab1f98d22cf15f39dd1c2ce77675e628fceb31 # bad: [30f591c3869f3bbe6eca1249dcef1b8337312de6] [lldb] Disable TestLaunchProcessPosixSpawn.py with reproducers git bisect bad 30f591c3869f3bbe6eca1249dcef1b8337312de6 # good: [6c4f2508e4278ac789230cb05f2bb56a8a7297dc] Revert "[lldb] [gdb-remote client] Refactor handling qSupported" git bisect good 6c4f2508e4278ac789230cb05f2bb56a8a7297dc # bad: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass git bisect bad e96df3e531f506eea75da0f13d0f8aa9a267f975 # good: [1310a19af06262122a6e9e4f6fbbe9c39ebad76e] [mlir] Use MCJIT to fix integration tests git bisect good 1310a19af06262122a6e9e4f6fbbe9c39ebad76e # first bad commit: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass
My reproduction steps, apologies if they are not reduced enough:
$ mkdir -p build/stage{1,2} $ cmake \ -B build/stage1 \ -G Ninja \ -DCMAKE_C_COMPILER=$(command -v clang) \ -DCMAKE_CXX_COMPILER=$(command -v clang++) \ -DLLVM_USE_LINKER=$(command -v ld.lld) \ -DLLVM_ENABLE_PROJECTS="clang;lld" \ -DLLVM_TARGETS_TO_BUILD=host \ -DLLVM_CCACHE_BUILD=ON \ -DCMAKE_BUILD_TYPE=Release \ llvm ... $ ninja -C build/stage1 all ... $ cmake \ -B build/stage2 \ -G Ninja \ -DCMAKE_AR=$PWD/build/stage1/bin/llvm-ar \ -DCMAKE_C_COMPILER=$PWD/build/stage1/bin/clang \ -DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \ -DCMAKE_CXX_COMPILER=$PWD/build/stage1/bin/clang++ \ -DLLVM_USE_LINKER=$PWD/build/stage1/bin/ld.lld \ -DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \ -DCMAKE_RANLIB=$PWD/build/stage1/bin/llvm-ranlib \ -DLLVM_ENABLE_PROJECTS=clang \ -DLLVM_TARGETS_TO_BUILD=host \ -DCMAKE_BUILD_TYPE=Release \ -DLLVM_ENABLE_LTO=Full \ llvm $ ninja -C build/stage2 lib/libclang-cpp.so.13git ... [2296/2296] Linking CXX shared library lib/libclang-cpp.so.13git FAILED: lib/libclang-cpp.so.13git ... ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC >>> defined in lto.tmp >>> referenced by ld-temp.o >>> lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0x8) ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC >>> defined in lto.tmp >>> referenced by ld-temp.o >>> lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0xC) ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC >>> defined in lto.tmp >>> referenced by ld-temp.o >>> lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x8) ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<60l, 1l> >::value; recompile with -fPIC >>> defined in lto.tmp >>> referenced by ld-temp.o >>> lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x3C) ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC >>> defined in lto.tmp >>> referenced by ld-temp.o >>> lto.tmp:(.data.rel.ro..Lreltable._ZN4llvm15MCSymbolRefExpr18getVariantKindNameENS0_11VariantKindE+0xC0) clang-13: error: linker command failed with exit code 1 (use -v to see invocation) ninja: build stopped: subcommand failed.
This patch breaks a two stage build with LTO:
Thanks for the report @nathanchance, and I'm looking at it now.
Is this failure from one of the bots?
Thanks!
Is this failure from one of the bots?
No, this was discovered by a user of my LLVM build script:
https://github.com/ClangBuiltLinux/tc-build
https://github.com/kdrag0n/proton-clang-build/runs/2371499466?check_suite_focus=true
@nathanchance do you prefer me to revert the patch first as it might take me a while to investigate it?
Btw, I was able to reproduce the issue by following your steps.
I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).
I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).
Definitely! If I cannot fix it in two days, I'll revert the change.
Update: @nathanchance I need to investigate this issue further, so I disabled this pass in LTO pre-link phase to fix the broken build (https://reviews.llvm.org/D94355).
Please let me know if there is still an issue in that build!
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
74 | In the version of the patch that you committed, you have this check here: // If operand is mutable, do not generate a relative lookup table. auto *GlovalVarOp = dyn_cast<GlobalVariable>(GVOp); if (!GlovalVarOp || !GlovalVarOp->isConstant()) return false;
|
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
74 |
|
FWIW, this commit turned out to break the FreeBSD dns/bind916 port, see https://bugs.freebsd.org/259921.
The short story is that the bind9 code on and after this line: https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1525 gets changed from something like:
.Ltmp661: #DEBUG_VALUE: isc_log_doit:category_channels <- $r12 .loc 3 0 58 # log.c:0:58 xorl %eax, %eax testl %r15d, %r15d setg %al movl %r15d, %ecx negl %ecx movq %rcx, -840(%rbp) # 8-byte Spill leaq 8328(%r13), %rcx #DEBUG_VALUE: isc_log_doit:matched <- 0 movq %rcx, -808(%rbp) # 8-byte Spill .Ltmp662: .loc 3 1552 25 is_stmt 1 # log.c:1552:25
to using a relative lookup table:
.Ltmp661: #DEBUG_VALUE: isc_log_doit:category_channels <- $r12 .loc 3 0 58 # log.c:0:58 xorl %eax, %eax testl %r15d, %r15d setg %al movl %r15d, %edx negl %edx leaq reltable.isc_log_doit(%rip), %rcx movq %rdx, -848(%rbp) # 8-byte Spill movslq (%rcx,%rdx,4), %rdx addq %rcx, %rdx movq %rdx, -840(%rbp) # 8-byte Spill leaq 8328(%r13), %rcx #DEBUG_VALUE: isc_log_doit:matched <- 0 movq %rcx, -808(%rbp) # 8-byte Spill .Ltmp662: .loc 3 1552 25 is_stmt 1 # log.c:1552:25
However, the value of %rcx at the movslq (%rcx,%rdx,4), %rdx statement becomes -2, so it attempts to access data before reltable.isc_log_doit. As that is in .rodata, this leads to a segfault.
The current working theory is that some code is hoisted out of the do-while loop starting at https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1531, in particular the [-level] accesses on lines 1613 and 1843:
snprintf(level_string, sizeof(level_string), "%s: ", log_level_strings[-level]); ... } else { syslog_level = syslog_map[-level]; }
but maybe these negative offsets confuse the lookup table converter?
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
133 | This line causes the bug seen in bind. In that case, the GEP has been hoisted, but the load has not. In general the GEP could be in a different basic block, or even in the same basic block with an instruction that may not return (intrinsic, real function call, well-defined language-level exception, etc). You can insert the reltable.shift where the GEP is, and that probably makes sense given it serves (part of) the same purpose, but you *must* insert the actual reltable.intrinsic where the original load is, unless you've gone to great lengths to prove it's safe not to (which seems best left to the usual culprits like LICM). IR test cases: https://godbolt.org/z/YMdaMrobE (bind is characterised by the first of the two functions) |
llvm/include/llvm/CodeGen/BasicTTIImpl.h | ||
---|---|---|
396 | The meanings of code models isn't really portable across targets... e.g. RISC-V's medium (underlying LLVM name for -mcmodel=medany) assumes 32-bit PC-relative offsets, and thus using a 32-bit table-relative offset is safe |
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp | ||
---|---|---|
133 | @dim and @jrtc27 thank you for reporting it. |