Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a pass that converts lookup tables to relative lookup tables to make them PIC-friendly.
Details
- Reviewers
leonardchan mcgrathr phosek hans
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
- Add dso_local check
- Remove -relative-switch-lookup-table flag, and enable it by default
- Use shift instead of mul
- Remove /ARM test and merge it with /X86 test
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll | ||
---|---|---|
1 ↗ | (On Diff #316538) |
There are two existing switch-to-lookup tests one under ARM/ and one under /X86 (I don't know the original reason).
Could you please clarify that? |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
5469 ↗ | (On Diff #316824) | Since this is a helper function used internally by the class, it should be private. |
5473 ↗ | (On Diff #316824) | Actually, this should probably be private too. |
5701 ↗ | (On Diff #316824) | It's a little annoying that we have to pass in a Builder here (and also to the SwitchLookupTable class) just to create integer types. I don't think a builder is strictly necessary for that. But more importantly, I'm not sure hard-coding this to 64 bits for the pointer and 32 bits for the offset is correct. Shouldn't this depend on the target? |
5740–5747 ↗ | (On Diff #316538) | One alternative would also be to bake the cases together: case ArrayKind: case RelOffsetArrayKind: { ... common stuff ... if (Kind == RelOffsetArrayKind) { .. special stuff .. } |
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll | ||
1 ↗ | (On Diff #316538) |
Yes, exactly. This seems important since we don't want the transformation to run for the no-PIC case (and currently I don't see anything in the code which ensures that.) It might help to put the relative lookup table test in a separate file, and use two invocations once with PIC and one without, and check the expected results with different FileCheck prefixes. |
Some targets already do this. Please check that you don't create regressions, especially on PowerPC.
It might be specific to the jump table case, but it should be instructional on how to do it. One important point is that it avoids inter-section relocations, which are a problem at least on MIPS.
Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.
On all targets I'm aware of, the only kind of expression involving two symbols like this that can be expressed at all in assembly / relocations is a PC-relative case where one of the symbols is in the same section of the same TU as the data containing the reference (and is not dynamically interposable, i.e. has internal linkage or is dso_local). I don't think mips or powerpc is at all unlike any other platform in this regard. Indeed this applies to targets where the pointer size is not 64 bits, too. But as far as I'm aware, in all cases there is a 32-bit signed offset option for PC-relative references and in many that's the only option. So it probably is reasonable to hard-code the table entry offset size as 32 bits, though the pointer size obviously should be whatever is the size of pointers on the particular target.
It's possible the 32-bit offsets won't work when in medium or large code models on 64-bit machines. On some machines that have code models where 32 bit offsets are not always sufficient, there are 64-bit PC-relative relocs you can use so the same method but using 64-bit table entries is feasible. But I'm not sure that's the case of all 64-bit machines (it is true of aarch64 and x86-64 when using ELF, but I don't know others off hand). So it might be necessary to parameterize either enabling the PIC-friendly flavor of the optimization, or the offset size to use in table entries, or both, based on the target and the code model setting as interpreted for each particular target.
- Apply relative lookup table generation in fPIC mode
- Add a test case for single value case
- Remove hard-coding 64 bit pointers
- Improve implementation and test coverage
Almost there! Just a few more comments on my end.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
5694 ↗ | (On Diff #318372) | I think we should also return false if it doesn't turn out to be a GlobalValue here. |
llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll | ||
201–203 ↗ | (On Diff #318372) | It looks like this might not be testing what the comment says it should. It looks like in the phi statements below that each of the cases have different values. I think maybe you'll want something like: %str1.0 = phi i8* [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.default ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb2 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb1 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %entry ] ret i8* %str1.0 ; instead of `ret void` to ensure the `%str1.0` isn't optimized out Then ensure this just lowers directly to a GEP for @.str and no lookup table is generated. |
clang/test/CodeGen/switch-to-lookup-table.c | ||
---|---|---|
3 | Clang codegen tests are not normally used to test LLVM optimizations. I think the tests for this should all live in LLVM, not Clang. (Also aarch64 is not guaranteed to be included as a target in the LLVM build, so this test would not necessarily run.) | |
7 | This table and the one below are very hard to read like this. Could you split it into multiple lines using FNOPIC-SAME? | |
37 | I think the minimum number of cases for the switch-to-lookup table transformation is only 4 or 5. To make the test easier to read, I'd suggest using the minimum number of cases in the switch. | |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
5658 ↗ | (On Diff #318372) | This comment seems unnecessary, at this point we know we're generating the relative table. |
5695 ↗ | (On Diff #318372) | I don't remember, will isDSOLocal() return true also if it's a private or internal symbol? Otherwise maybe this should check isLocalLinkage() also. |
5709 ↗ | (On Diff #318372) | I do worry about hard-coding this to 32 bits. As someone pointed out, it would not necessary hold in all code models for x86. Similarly to the PIC check in ShouldBuildRelLookupTable(), is there some check we could do to make sure 32 bits is appropriate? |
5712 ↗ | (On Diff #318372) | The Builder points to a specific insertion point in a basic block for the lookup, so it knows the Module and adding the Module parameter is redundant. |
llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll | ||
7 ↗ | (On Diff #318372) | Same comment as I made for the test under clang/ above: I think fewer switch cases are probably enough to test this, and would make it easier to read. Also splitting the lookup tables over multiple lines would help too. |
- Simplified test cases and increased readibility of the tables
- Added x86_64 and aarch64 check and tiny or small code modes check to ensure 32 offsets
- Modified single value test case
clang/test/CodeGen/switch-to-lookup-table.c | ||
---|---|---|
3 | I'm not able to use -fPIC and -fno-PIC options in the opt tool. I changed the target to x86_64-linux in this test. | |
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
5709 ↗ | (On Diff #318372) | I added x86_64 and aarch64 target check and tiny or small code mode check to ensure 32 offsets. |
Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
5512–5525 ↗ | (On Diff #320025) | This should be some TLI/TTI hook. |
@mcgrathr gave some explanation to that:
Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.
I also want to add to that:
This task specifically tries make switch-to-lookup tables PIC-friendly, but it does not necessarily try to make all the jump tables PIC-friendly.
There is also another work going on to generate PIC-friendly vtables.
Therefore, converting non-relative lookup tables to relative tables elsewhere can be explored as a separate task.
llvm/lib/Transforms/Utils/SimplifyCFG.cpp | ||
---|---|---|
5512–5525 ↗ | (On Diff #320025) |
Could you please elaborate on that? |
Personally, i read that as non-answer,
because this just reiterates that it can be done elsewhere,
and doesn't answer my question as to why that isn't the path taken.
First of all, I find this patch to be nearly impossible to read. It seems to mix a lot of refactoring with a functional change, making it very hard to focus on the core.
The main difference to the jump table logic is that the latter knows that all referenced addresses are within a function and therefore well contained. Nothing of the like seems to be found here. E.g. if this is supposed to address only unnamed pointers, it should be grouping them together and compute the offsets and then pick the optimal size. That's a transformation that can be beneficial for all modes for a not too large table. But it is hard to see what is going on here with all the seemingly unrelated changes.
@lebedev.ri based on your feedback, I added it as a separate pass and added support for user-defined lookup tables.
Please let me know if you have any comments.
Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.
clang/test/CodeGen/switch-to-lookup-table.c | ||
---|---|---|
3 | Buildbots may not have x86_64 as a registered target, so this test will break some buildbots. I think the opt flags -relocation-model=pic and -relocation-model=static will do the right thing (see for example llvm/test/Transforms/SimplifyCFG/ARM/switch-to-lookup-table.ll | |
llvm/include/llvm/InitializePasses.h | ||
321 | In some places the pass is referred to as a generator and here it's a converter. I think converter is a better name, and it should be consistent. | |
llvm/include/llvm/Transforms/Utils/RelLookupTableGenerator.h | ||
10 ↗ | (On Diff #324144) | For this kind of pass, I think it would be helpful to have a small example in the comment that shows what kind of IR it's looking for, and what it will be transformed to. (Either here or in the .cpp file.) |
22 ↗ | (On Diff #324144) | No need to call it simple :) |
33 ↗ | (On Diff #324144) | As clang-tidy points out, the comment doesn't match the actual macro name. |
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp | ||
25 ↗ | (On Diff #324144) | Since the function below are only used within this file, they should be static or in an anonymous namespace. Since you're already using an anonymous namespace for the RelLookupTableConverterPass class, I'd suggest using that for these functions too. |
28 ↗ | (On Diff #324144) | This explains the "what" of what the code does, but not the "why". Why should the transformation only run for these two targets? |
34 ↗ | (On Diff #324144) | This also needs the "why". |
39 ↗ | (On Diff #324144) | Again, this needs the "why". |
58 ↗ | (On Diff #324144) | The comment doesn't match the code exactly I think, since further down you also allow GetElementPtr expressions. Maybe the comment could be clearer. |
78 ↗ | (On Diff #324144) | Should it be hasOneUse() or hasOneUser()? Also maybe this check could some before the for-loop, since it's cheaper and may return early. There probably also needs to be a check that the global has local linkage, otherwise it could be referenced outside the module. |
88 ↗ | (On Diff #324144) | if you know that the initializer is a ConstantArray, you can use cast<> instead of dyn_cast<> here |
91 ↗ | (On Diff #324144) | The int32 type here has been mentioned in reviews before. I think at least, the code needs to have a good motivation for why 32 bits is enough. |
119 ↗ | (On Diff #324144) | The "can" part of the name seems misleading, since this doesn't return true or false, but actually tries to build the lookup table (and possibly returns null if it can't). I'd just drop "can" from the name. Oh, but there is already a generateRelLookupTable() function.. well, maybe there is a better name for one of them. |
122 ↗ | (On Diff #324144) | This code which checks that the user of the table is a GEP, followed by a load, etc. feels like a continuation of the checks in shouldGenerateRelLookupTableForGlobal(). I would suggest just moving those checks into this function. |
125 ↗ | (On Diff #324144) | getNextNode() doesn't seem right. It gets the next instruction, but that is not necessarily the same as the instruction using the GEP. Also, the code probably needs to check that the GEP only has one use |
129 ↗ | (On Diff #324144) | (nit: commonly, llvm code uses an M variable for the module) |
166 ↗ | (On Diff #324144) | I'd suggest moving this to the top of the function, before even declaring Changed. |
169 ↗ | (On Diff #324144) | This would be simpler as for (GlobalVariable *GlobalVar : M.globals()) { |
llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll | ||
3 | (you could skip the -mtriple argument here since there is a "target triple" line in the IR below) |
It looks like you have everything setup for running on the new PM, but it doesn't look like the pass is added anywhere in the new PM pipeline. Unfortunately, I don't *think* there's anything in the new PM that acts as a "default pipeline that gets added to all other pipelines" similar to PassManagerBuilder::populateModulePassManager, so we'll need to manually include this somewhere in PassBuilder::buildO0DefaultPipeline and PassBuilder::buildPerModuleDefaultPipeline.
Depending on if we also want to support this in [thin]LTO, we may need to add this to more pipelines.
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp | ||
---|---|---|
46 ↗ | (On Diff #324144) | We should also check if the switch table itself is dso_local since the right relocation won't be generated if it isn't. |
51 ↗ | (On Diff #324144) | cast since we checked before this is a ConstantArray. Alternatively, what you could have is: if (!GlobalVar.hasInitializer()) return false; ConstantArray *Array = dyn_cast<ConstantArray>(GlobalVar.getInitializer()); if (!Array || !Array->getType()->getElementType()->isPointerTy()) return false; |
61 ↗ | (On Diff #324144) | Rather than checking the linkage explicitly, you can use isImplicitDSOLocal which also has some visibility checks. GlobalVarOp->isDSOLocal() || GlobalVarOp->isImplicitDSOLocal() |
64–70 ↗ | (On Diff #324144) | It seems that with this, we're limiting this to only arrays with GEPs with globals as the base, but I think this will return false if the array element is just a dso_local global. We definitely should still be taking into account GEPs though. I'm thinking IsConstantOffsetFromGlobal might be more useful here since it already contains a bunch of logic for handling ConstantExpr GEPs, then you can check if the global found by that is dso_local. |
76–77 ↗ | (On Diff #324144) | What's the reason for why the number of users matters? |
92–95 ↗ | (On Diff #324144) | I think the visibility and linkage should be set the same as those of the original lookup table. I think to avoid many changes to the original lookup table and only focus on the new layout, we should also propagate any properties of the original table to the new relative table. That is, visibility, linkage, attributes, unnamed_addr, etc. should be copied from the original. (For copying attributes, you can use copyAttributesFrom.) |
llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll | ||
32 | We should also have cases that cover other linkages/visibilities:
|
- Rename the pass to RelLookupTableConverter to be consistent
- Addressed reviewers' feedback
- Added tests for user-defined lookup tables and hidden visibility
clang/test/CodeGen/switch-to-lookup-table.c | ||
---|---|---|
3 | I added x86-registered-target to ensure that it only runs on buildbots that have x86_64 as a registered target | |
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp | ||
39 ↗ | (On Diff #324144) | I added the reason, please let me know if that's not clear enough. |
169 ↗ | (On Diff #324144) | Please keep that in mind that I'm deleting a global variable while I'm iterating over global variables. |
Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.
Thank you very much for all of your wonderful constructive feedback!
I learned much more about LLVM and IR internals.
Appreciate all your help!
Thank you very much for pointing that out @leonardchan !
I added this pass into both pass managers now.
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp | ||
---|---|---|
76–77 ↗ | (On Diff #324144) | We generate one lookup table per switch statement whenever possible. |