Page MenuHomePhabricator

[Passes] Add relative lookup table converter pass
ClosedPublic

Authored by gulfem on Jan 8 2021, 6:28 PM.

Details

Summary

Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
gulfem added a comment.Mar 4 2021, 6:29 PM

Use TTI hook for target machine checks

One thing that just occurred to me: do we also perhaps want to hide this behind a flag? Right now it's being added to various default optimization pipelines, so some users might be surprised if they suddenly see their lookup tables change (either compiler or user generated). Do we want to make this something more opt-in, or is the layout itself not necessarily a concern to most users?

clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

+1 on this. Unless this functionally changes something in the clang codebase, this test shouldn't be here. As hans pointed out, setting the -relocation-model should be enough.

llvm/include/llvm/CodeGen/BasicTTIImpl.h
384

Sorry, I think you might have explained this offline, but what was the reason for needing this in TTI? I would've though this information could be found in the Module (PIC/no PIC, 64-bit or not, code model). If it turns out all of this is available in Module, then we could greatly simplify some logic here by just checking this at the start of the pass run.

If TTI is needed, then perhaps it may be better to just inline all these checks in convertToRelativeLookupTables since this is the only place this is called. I think we would only want to keep this here as a virtual method if we plan to have multiple TTI-impls overriding this.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
880

Should this be added at the end of the pipeline? It could be possible that other passes insert lookup tables but they may be untouched by this if this pass runs before them.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
31–32

Nit: //

41

isa

54–71

I think I mentioned this in a previous round of comments, but what you probably want here is just a way to determine if the operand is either a global or some constant offset from global. Right now it looks this this will ignore other constant expressions like bitcasts or ptrtoints which we also want to catch. I think it might be better to use IsConstantOffsetFromGlobal which already handles these cases.

80

cast

219

Should we be returning the value returned by this?

gulfem updated this revision to Diff 330405.Mar 12 2021, 5:22 PM
  1. Used IsConstantOffsetFromGlobal() function
  2. Added this pass to the end of legacy pass manager pipeline
  3. Moved all the tests under a new test directory
gulfem marked 7 inline comments as done.Mar 12 2021, 6:12 PM
gulfem added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
384

Code model or PIC/noPIC is only set in the module if the user explicitly specifies them.
TTI hook is necessary to access target machine information like the default code model.
TTI is basically the interface to communicate between IR transformations and codegen.

I think the checks cannot be moved into the pass because it uses the TargetMachine which is only available/visible in the TTI implementation itself.
That's why I added a function into TTI to do target machine specific checks.
Similar approach is used during lookup table generation (shouldBuildLookupTables) to check codegen info.

leonardchan accepted this revision.Mar 15 2021, 11:56 AM

LGTM pending a few more comments. Should also give some time to let others respond if they have feedback.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
41–42

Not needed here since you have the isa below.

llvm/test/Transforms/RelLookupTableConverter/relative_lookup_table.ll
2–3

We should also check some other RUNs to check that this isn't run on cases that return false in shouldBuildRelLookupTables: non-PIC, non-64-bit, other code model sizes, etc.

llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll
39

It looks like this test case isn't much different from string_table in relative_lookup_table.ll? If so, then this file could be removed.

llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64

Good that you added this, but I think Nico has a bot that automatically updates these BUILD.gn files so manually updating them may not be necessary.

This revision is now accepted and ready to land.Mar 15 2021, 11:56 AM
lebedev.ri added inline comments.Mar 15 2021, 12:04 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391
  1. But all tests are using x86_64 triple?
  2. This is somewhat backwards. if the target wants to disable this, it will need to override this function with return false;.
hans added a comment.Mar 16 2021, 10:56 AM

Sorry for being unresponsive for a while, I got distracted by various bugs.

I skimmed this and it's looking great. Just added a few nit picks.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
32

It would be better if the comment said why. I suppose the reason is we need to be sure there aren't other uses of the table, because then it can't be replaced.

But it would be cool if a future version of this patch could handle when there are multiple loads from the table which can all be replaced -- for example this could happen if a function which uses a lookup table gets inlined into multiple places.

47

It would be good with a "why" here too.

gulfem updated this revision to Diff 331139.Mar 16 2021, 6:01 PM
  1. Add no_relative_lookup_table.ll test case that checks the cases where relative lookup table should not be generated
  2. Add comments about dso_local check and single use check
gulfem marked 4 inline comments as done.Mar 16 2021, 6:10 PM
gulfem added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
32

I actually ran into the exact case that you described during testing, where a function that uses a switch gets inlined into multiple call sites :)
This is only to simplify the analysis, and I now added a TODO section to explore that later.

llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll
39

I renamed this test case to no_relative_lookup_table.ll that checks the cases where relative lookup table should not be generated like in non-pic mode, medium or large code models, and 32 bit architectures, etc.

gulfem marked 2 inline comments as done.Mar 16 2021, 6:20 PM
gulfem added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391
  1. Although I used x86_64 triple, this optimization can be applied to other 64-bit architectures too, because it not target dependent except isArch64Bit and getCodeModel check.
  2. Is there a target that you have in mind that we need to disable this optimization?

I thought that it makes sense to enable this optimization by default on all the targets that can support it.
In case targets want to disable it, they can override it as you said.
How can we improve the implementation?
If you have suggestions, I'm happy to incorporate that.

@lebedev.ri do you have further comments?
If not, I would like to submit this patch.

lebedev.ri added inline comments.Mar 19 2021, 12:01 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391

I'm sorry, i do not understand.
Why does !TM.getTargetTriple().isArch64Bit() check exist?
To me it reads as "if we aren't compiling for AArch64, don't build rel lookup tables".
Am i misreading this?

gulfem added inline comments.Mar 19 2021, 2:19 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391

isArch64Bit checks whether we have a 64-bit architecture, right?
I don't think it specifically checks for AArch64, and it can cover other 64-bit architectures like x86_64 as well.

lebedev.ri accepted this revision.Mar 19 2021, 2:35 PM

Thank you!

llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391

isArch64Bit checks whether we have a 64-bit architecture, right?

D'oh. I really did read it as A*A*rch64 :/ Sorry.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
173–174

make_early_inc_range()?

201

I would think this should be

PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();
return PA;

since this doesn't touch CFG at all.
I think this should get rid of redundant Running analysis: TargetIRAnalysis.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 22 2021, 3:36 PM
thakis added inline comments.
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64

Please don't touch gn files unless you use them. Simple file additions in cmake files are synced automatically http://github.com/llvmgnsyncbot

You forgot to add the trailing comma and now I had to fix it up manually instead of doing nothing :P

gulfem added inline comments.Mar 22 2021, 3:50 PM
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64

Sorry about that Niko. I can fix it, so you don't need to do anything.
Leo actually pointed that out, but I thought that manually changing it won't do any harm.
Apparently it did though!

arichardson added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
92

This should use the same address space as the original global.

It looks like this is breaking the Windows/ARM(64) target - it doesn't produce the right relative relocations for symbol differences. It can be reproduced with a testcase like this:

$ cat test.s
        .text
func1:
        ret
func2:
        ret

        .section        .rdata,"dr"
        .p2align        2
table:
        .long   func1-table
        .long   func2-table
$ clang -target aarch64-windows -c -o - test.s | llvm-objdump -r -s -

<stdin>:        file format coff-arm64

RELOCATION RECORDS FOR [.rdata]:
OFFSET           TYPE                     VALUE
0000000000000000 IMAGE_REL_ARM64_ADDR32   func1
0000000000000004 IMAGE_REL_ARM64_ADDR32   func2

Contents of section .text:
 0000 c0035fd6 c0035fd6                    .._..._.
Contents of section .rdata: 
 0000 00000000 04000000                    ........

Those relocations would need to be IMAGE_REL_ARM64_REL32. It looks like the arm/windows target has got the same issue as well.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

It looks like this is breaking the Windows/ARM(64) target - it doesn't produce the right relative relocations for symbol differences. It can be reproduced with a testcase like this:

[...]

Those relocations would need to be IMAGE_REL_ARM64_REL32. It looks like the arm/windows target has got the same issue as well.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

It turned out to not be all that hard to fix actually, see D99572 for such a fix. If I can get that landed soon, I think we might not need to act on this one.

rnk added subscribers: aeubanks, rnk.Mar 31 2021, 1:16 PM
rnk added inline comments.
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323

Putting a ModulePass in the middle of the CodeGen pass pipeline creates a "pass barrier": now instead of applying every pass to each function in turn, the old pass manager will stop, run this whole-module pass, and then run subseqeunt passes in the next function pass manager on each function in turn. This isn't ideal. @aeubanks, can you follow-up to make sure this is addressed?

We had the same issues with the SymbolRewriter pass, which if you grep for "Rewrite Symbols" you can see has the same issue. I remember writing a patch to fix it, but I guess I never landed it.

aeubanks added inline comments.Mar 31 2021, 1:52 PM
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323

I see "Rewrite Symbols" in the codegen pipeline and yeah it's splitting the function pass manager.

For this patch, can we just not add the pass to the legacy PM pipeline? It's deprecated and the new PM is already the default for the optimization pipeline.

aeubanks added inline comments.Mar 31 2021, 11:31 PM
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323

(https://reviews.llvm.org/D99707 for anybody interested)

gulfem added a comment.Apr 1 2021, 5:05 PM

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

I will disable the pass for those targets for now.
When the issue is resolved, I would like to enable it for those targets as well.

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323

For this patch, can we just not add the pass to the legacy PM pipeline? It's deprecated and the new PM is already the default for the optimization pipeline.

@rnk @aeubanks
If it causes issues, I'm ok to remove it from the legacy PM pipeline.
When I land this patch, I'll only add it to new PM.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

I will disable the pass for those targets for now.
When the issue is resolved, I would like to enable it for those targets as well.

FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).

FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).

Thank you @mstorsjo!

This patch breaks a two stage build with LTO:

$ git bisect log
# bad: [f0bc2782f281ca05221d2f1735bbaff6c4b81ebb] [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
# good: [9829f5e6b1bca9b61efc629770d28bb9014dec45] [CVP] @llvm.[us]{min,max}() intrinsics handling
git bisect start 'f0bc2782f281ca05221d2f1735bbaff6c4b81ebb' '9829f5e6b1bca9b61efc629770d28bb9014dec45'
# bad: [1af35e77f4b8c3314dc20a10d579b52f22c75a00] [TTI] NFC: Change getVectorInstrCost to return InstructionCost
git bisect bad 1af35e77f4b8c3314dc20a10d579b52f22c75a00
# bad: [45f8946a759a780e6131256d6d206977b9c128ee] [CodeView] Fix the ARM64 CPUType enum
git bisect bad 45f8946a759a780e6131256d6d206977b9c128ee
# good: [0b439e4cc9dbb5c226121383b84d4f48ab669c55] [libc++] Split std::allocator out of <memory>
git bisect good 0b439e4cc9dbb5c226121383b84d4f48ab669c55
# good: [2eb98d89ac866e32cb56727174e4d1c1413479c8] [mlir][spirv] Allow bitwidth emulation on runtime arrays
git bisect good 2eb98d89ac866e32cb56727174e4d1c1413479c8
# bad: [80aa9b0f7b3ebe53220a398b2939610d8a49e24b] [PowerPC] stop reverse mem op generation for some cases.
git bisect bad 80aa9b0f7b3ebe53220a398b2939610d8a49e24b
# good: [a8ab1f98d22cf15f39dd1c2ce77675e628fceb31] [Evaluator] Look through invariant.group intrinsics
git bisect good a8ab1f98d22cf15f39dd1c2ce77675e628fceb31
# bad: [30f591c3869f3bbe6eca1249dcef1b8337312de6] [lldb] Disable TestLaunchProcessPosixSpawn.py with reproducers
git bisect bad 30f591c3869f3bbe6eca1249dcef1b8337312de6
# good: [6c4f2508e4278ac789230cb05f2bb56a8a7297dc] Revert "[lldb] [gdb-remote client] Refactor handling qSupported"
git bisect good 6c4f2508e4278ac789230cb05f2bb56a8a7297dc
# bad: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass
git bisect bad e96df3e531f506eea75da0f13d0f8aa9a267f975
# good: [1310a19af06262122a6e9e4f6fbbe9c39ebad76e] [mlir] Use MCJIT to fix integration tests
git bisect good 1310a19af06262122a6e9e4f6fbbe9c39ebad76e
# first bad commit: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass

My reproduction steps, apologies if they are not reduced enough:

$ mkdir -p build/stage{1,2}

$ cmake \
   -B build/stage1 \
   -G Ninja \
   -DCMAKE_C_COMPILER=$(command -v clang) \
   -DCMAKE_CXX_COMPILER=$(command -v clang++) \
   -DLLVM_USE_LINKER=$(command -v ld.lld) \
   -DLLVM_ENABLE_PROJECTS="clang;lld" \
   -DLLVM_TARGETS_TO_BUILD=host \
   -DLLVM_CCACHE_BUILD=ON \
   -DCMAKE_BUILD_TYPE=Release \
   llvm
...

$ ninja -C build/stage1 all
...

$ cmake \
   -B build/stage2 \
   -G Ninja \
   -DCMAKE_AR=$PWD/build/stage1/bin/llvm-ar \
   -DCMAKE_C_COMPILER=$PWD/build/stage1/bin/clang \
   -DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \
   -DCMAKE_CXX_COMPILER=$PWD/build/stage1/bin/clang++ \
   -DLLVM_USE_LINKER=$PWD/build/stage1/bin/ld.lld \
   -DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \
   -DCMAKE_RANLIB=$PWD/build/stage1/bin/llvm-ranlib \
   -DLLVM_ENABLE_PROJECTS=clang \
   -DLLVM_TARGETS_TO_BUILD=host \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_LTO=Full \
   llvm

$ ninja -C build/stage2 lib/libclang-cpp.so.13git
...
[2296/2296] Linking CXX shared library lib/libclang-cpp.so.13git
FAILED: lib/libclang-cpp.so.13git
...
ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0x8)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0xC)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x8)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<60l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x3C)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN4llvm15MCSymbolRefExpr18getVariantKindNameENS0_11VariantKindE+0xC0)
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

This patch breaks a two stage build with LTO:

Thanks for the report @nathanchance, and I'm looking at it now.
Is this failure from one of the bots?

Thanks for the report @nathanchance, and I'm looking at it now.

Thanks!

Is this failure from one of the bots?

No, this was discovered by a user of my LLVM build script:

https://github.com/ClangBuiltLinux/tc-build

https://github.com/kdrag0n/proton-clang-build/runs/2371499466?check_suite_focus=true

gulfem added a comment.EditedApr 26 2021, 2:40 PM

@nathanchance do you prefer me to revert the patch first as it might take me a while to investigate it?
Btw, I was able to reproduce the issue by following your steps.

@nathanchance do you prefer me to revert the patch first as it might take me a while to investigate it?
Btw, I was able to reproduce the issue by following your steps.

I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).

gulfem added a comment.EditedApr 28 2021, 3:45 PM

I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).

Definitely! If I cannot fix it in two days, I'll revert the change.
Update: @nathanchance I need to investigate this issue further, so I disabled this pass in LTO pre-link phase to fix the broken build (https://reviews.llvm.org/D94355).
Please let me know if there is still an issue in that build!

pcc added a subscriber: pcc.Jun 23 2021, 8:50 PM
pcc added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
74

In the version of the patch that you committed, you have this check here:

// If operand is mutable, do not generate a relative lookup table.
auto *GlovalVarOp = dyn_cast<GlobalVariable>(GVOp);
if (!GlovalVarOp || !GlovalVarOp->isConstant())
  return false;
  1. Nit: Gloval -> Global
  2. Why is it important whether the referenced global is mutable? The pointer itself is constant.
gulfem added inline comments.Jun 25 2021, 4:16 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
74
  1. That's a typo, and I will fix that.
  2. This optimization does not do a detailed analysis, and it is being conservative. In this case, GlobalVar points to the switch lookup table and GlobalVarOp points to the elements in the lookup table like strings. To make sure that relative arithmetic works, it just checks whether both GlobalVar and GlobalVarOp pointers are constants. Did you see an issue on that?

FWIW, this commit turned out to break the FreeBSD dns/bind916 port, see https://bugs.freebsd.org/259921.

The short story is that the bind9 code on and after this line: https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1525 gets changed from something like:

.Ltmp661:
        #DEBUG_VALUE: isc_log_doit:category_channels <- $r12
        .loc    3 0 58                          # log.c:0:58
        xorl    %eax, %eax
        testl   %r15d, %r15d
        setg    %al
        movl    %r15d, %ecx
        negl    %ecx
        movq    %rcx, -840(%rbp)                # 8-byte Spill
        leaq    8328(%r13), %rcx
        #DEBUG_VALUE: isc_log_doit:matched <- 0
        movq    %rcx, -808(%rbp)                # 8-byte Spill
.Ltmp662:
        .loc    3 1552 25 is_stmt 1             # log.c:1552:25

to using a relative lookup table:

.Ltmp661:
        #DEBUG_VALUE: isc_log_doit:category_channels <- $r12
        .loc    3 0 58                          # log.c:0:58
        xorl    %eax, %eax
        testl   %r15d, %r15d
        setg    %al
        movl    %r15d, %edx
        negl    %edx
        leaq    reltable.isc_log_doit(%rip), %rcx
        movq    %rdx, -848(%rbp)                # 8-byte Spill
        movslq  (%rcx,%rdx,4), %rdx
        addq    %rcx, %rdx
        movq    %rdx, -840(%rbp)                # 8-byte Spill
        leaq    8328(%r13), %rcx
        #DEBUG_VALUE: isc_log_doit:matched <- 0
        movq    %rcx, -808(%rbp)                # 8-byte Spill
.Ltmp662:
        .loc    3 1552 25 is_stmt 1             # log.c:1552:25

However, the value of %rcx at the movslq (%rcx,%rdx,4), %rdx statement becomes -2, so it attempts to access data before reltable.isc_log_doit. As that is in .rodata, this leads to a segfault.

The current working theory is that some code is hoisted out of the do-while loop starting at https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1531, in particular the [-level] accesses on lines 1613 and 1843:

                                snprintf(level_string, sizeof(level_string),
                                         "%s: ", log_level_strings[-level]);
...
                        } else {
                                syslog_level = syslog_map[-level];
                        }

but maybe these negative offsets confuse the lookup table converter?

jrtc27 added inline comments.Dec 10 2021, 2:46 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
133

This line causes the bug seen in bind. In that case, the GEP has been hoisted, but the load has not. In general the GEP could be in a different basic block, or even in the same basic block with an instruction that may not return (intrinsic, real function call, well-defined language-level exception, etc). You can insert the reltable.shift where the GEP is, and that probably makes sense given it serves (part of) the same purpose, but you *must* insert the actual reltable.intrinsic where the original load is, unless you've gone to great lengths to prove it's safe not to (which seems best left to the usual culprits like LICM).

IR test cases: https://godbolt.org/z/YMdaMrobE (bind is characterised by the first of the two functions)

jrtc27 added inline comments.Dec 10 2021, 2:52 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
396

The meanings of code models isn't really portable across targets... e.g. RISC-V's medium (underlying LLVM name for -mcmodel=medany) assumes 32-bit PC-relative offsets, and thus using a 32-bit table-relative offset is safe

gulfem added inline comments.Dec 10 2021, 7:39 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
133

@dim and @jrtc27 thank you for reporting it.
I see what's going wrong, and I uploaded a patch that fixes the issue by ensuring that the call to load.relative.intrinsic is inserted before the load, but not gep.
Please see https://reviews.llvm.org/D115571.