Page MenuHomePhabricator

[Passes] Add relative lookup table converter pass
Needs ReviewPublic

Authored by gulfem on Jan 8 2021, 6:28 PM.

Details

Summary

Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  1. Add dso_local check
  2. Remove -relative-switch-lookup-table flag, and enable it by default
  3. Use shift instead of mul
  4. Remove /ARM test and merge it with /X86 test
gulfem marked 4 inline comments as done.Jan 14 2021, 6:30 PM
gulfem marked 2 inline comments as done.Jan 14 2021, 6:36 PM
gulfem added inline comments.
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll
1 ↗(On Diff #316538)

Any reason to have tests both under ARM/ and X86/ for this?

There are two existing switch-to-lookup tests one under ARM/ and one under /X86 (I don't know the original reason).
/ARM test is much simpler. I now put all the relative lookup tests under /X86.

Also I'd like to see a test which shows that the transformation happens for PIC code but not for non-PIC.

Could you please clarify that?
Do you want to see a test that checks the code for non-PIC case when -fno-PIC is enabled?

hans added inline comments.Jan 15 2021, 1:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5469 ↗(On Diff #316824)

Since this is a helper function used internally by the class, it should be private.

5473 ↗(On Diff #316824)

Actually, this should probably be private too.

5701 ↗(On Diff #316824)

It's a little annoying that we have to pass in a Builder here (and also to the SwitchLookupTable class) just to create integer types. I don't think a builder is strictly necessary for that.

But more importantly, I'm not sure hard-coding this to 64 bits for the pointer and 32 bits for the offset is correct. Shouldn't this depend on the target?

5740–5747 ↗(On Diff #316538)

One alternative would also be to bake the cases together:

case ArrayKind:
case RelOffsetArrayKind: {
... common stuff ...
if (Kind == RelOffsetArrayKind) {
  .. special stuff ..
}
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll
1 ↗(On Diff #316538)

Do you want to see a test that checks the code for non-PIC case when -fno-PIC is enabled?

Yes, exactly. This seems important since we don't want the transformation to run for the no-PIC case (and currently I don't see anything in the code which ensures that.)

It might help to put the relative lookup table test in a separate file, and use two invocations once with PIC and one without, and check the expected results with different FileCheck prefixes.

joerg added a subscriber: joerg.Jan 15 2021, 5:25 AM

Some targets already do this. Please check that you don't create regressions, especially on PowerPC.

hans added a comment.Jan 15 2021, 5:57 AM

Some targets already do this. Please check that you don't create regressions, especially on PowerPC.

Joerg: any pointers to where ppc does this and what it does exactly?

joerg added a comment.Jan 15 2021, 6:06 AM

It might be specific to the jump table case, but it should be instructional on how to do it. One important point is that it avoids inter-section relocations, which are a problem at least on MIPS.

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

On all targets I'm aware of, the only kind of expression involving two symbols like this that can be expressed at all in assembly / relocations is a PC-relative case where one of the symbols is in the same section of the same TU as the data containing the reference (and is not dynamically interposable, i.e. has internal linkage or is dso_local). I don't think mips or powerpc is at all unlike any other platform in this regard. Indeed this applies to targets where the pointer size is not 64 bits, too. But as far as I'm aware, in all cases there is a 32-bit signed offset option for PC-relative references and in many that's the only option. So it probably is reasonable to hard-code the table entry offset size as 32 bits, though the pointer size obviously should be whatever is the size of pointers on the particular target.

It's possible the 32-bit offsets won't work when in medium or large code models on 64-bit machines. On some machines that have code models where 32 bit offsets are not always sufficient, there are 64-bit PC-relative relocs you can use so the same method but using 64-bit table entries is feasible. But I'm not sure that's the case of all 64-bit machines (it is true of aarch64 and x86-64 when using ELF, but I don't know others off hand). So it might be necessary to parameterize either enabling the PIC-friendly flavor of the optimization, or the offset size to use in table entries, or both, based on the target and the code model setting as interpreted for each particular target.

gulfem updated this revision to Diff 318372.Jan 21 2021, 5:59 PM
  1. Apply relative lookup table generation in fPIC mode
  2. Add a test case for single value case
  3. Remove hard-coding 64 bit pointers
  4. Improve implementation and test coverage
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 5:59 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
gulfem marked 4 inline comments as done.Jan 21 2021, 6:03 PM
gulfem marked 2 inline comments as done.Jan 21 2021, 6:49 PM

Almost there! Just a few more comments on my end.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5694 ↗(On Diff #318372)

I think we should also return false if it doesn't turn out to be a GlobalValue here.

llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll
201–203 ↗(On Diff #318372)

It looks like this might not be testing what the comment says it should. It looks like in the phi statements below that each of the cases have different values. I think maybe you'll want something like:

%str1.0 = phi i8* [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.default ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb2 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb1 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %entry ]
ret i8* %str1.0  ; instead of `ret void` to ensure the `%str1.0` isn't optimized out

Then ensure this just lowers directly to a GEP for @.str and no lookup table is generated.

hans added inline comments.Jan 25 2021, 2:31 AM
clang/test/CodeGen/switch-to-lookup-table.c
3

Clang codegen tests are not normally used to test LLVM optimizations. I think the tests for this should all live in LLVM, not Clang.

(Also aarch64 is not guaranteed to be included as a target in the LLVM build, so this test would not necessarily run.)

7

This table and the one below are very hard to read like this. Could you split it into multiple lines using FNOPIC-SAME?

37

I think the minimum number of cases for the switch-to-lookup table transformation is only 4 or 5. To make the test easier to read, I'd suggest using the minimum number of cases in the switch.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5658 ↗(On Diff #318372)

This comment seems unnecessary, at this point we know we're generating the relative table.

5695 ↗(On Diff #318372)

I don't remember, will isDSOLocal() return true also if it's a private or internal symbol? Otherwise maybe this should check isLocalLinkage() also.

5709 ↗(On Diff #318372)

I do worry about hard-coding this to 32 bits. As someone pointed out, it would not necessary hold in all code models for x86.

Similarly to the PIC check in ShouldBuildRelLookupTable(), is there some check we could do to make sure 32 bits is appropriate?

5712 ↗(On Diff #318372)

The Builder points to a specific insertion point in a basic block for the lookup, so it knows the Module and adding the Module parameter is redundant.

llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll
7 ↗(On Diff #318372)

Same comment as I made for the test under clang/ above: I think fewer switch cases are probably enough to test this, and would make it easier to read. Also splitting the lookup tables over multiple lines would help too.

gulfem updated this revision to Diff 320025.Jan 28 2021, 6:52 PM
  1. Simplified test cases and increased readibility of the tables
  2. Added x86_64 and aarch64 check and tiny or small code modes check to ensure 32 offsets
  3. Modified single value test case
gulfem marked 9 inline comments as done.Jan 28 2021, 7:03 PM
gulfem added inline comments.
clang/test/CodeGen/switch-to-lookup-table.c
3

I'm not able to use -fPIC and -fno-PIC options in the opt tool.
I am setting the PIC Level flag to enable -fPIC in `opt.
I thought that testing -fPIC and -fno-PIC in the same file seems easier and more readable in CodeGen tests.
Please let me know if you have a good suggestion how to do that with opt.

I changed the target to x86_64-linux in this test.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5709 ↗(On Diff #318372)

I added x86_64 and aarch64 target check and tiny or small code mode check to ensure 32 offsets.
Please let me know if you have any other concerns about that.

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

lebedev.ri added inline comments.Jan 29 2021, 1:38 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5512–5525 ↗(On Diff #320025)

This should be some TLI/TTI hook.

gulfem marked an inline comment as done.Jan 29 2021, 4:15 PM

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

@mcgrathr gave some explanation to that:

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

I also want to add to that:
This task specifically tries make switch-to-lookup tables PIC-friendly, but it does not necessarily try to make all the jump tables PIC-friendly.
There is also another work going on to generate PIC-friendly vtables.
Therefore, converting non-relative lookup tables to relative tables elsewhere can be explored as a separate task.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5512–5525 ↗(On Diff #320025)

This should be some TLI/TTI hook.

Could you please elaborate on that?
Are you talking about getting the PIC level?

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

@mcgrathr gave some explanation to that:

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

I also want to add to that:
This task specifically tries make switch-to-lookup tables PIC-friendly, but it does not necessarily try to make all the jump tables PIC-friendly.
There is also another work going on to generate PIC-friendly vtables.
Therefore, converting non-relative lookup tables to relative tables elsewhere can be explored as a separate task.

Personally, i read that as non-answer,
because this just reiterates that it can be done elsewhere,
and doesn't answer my question as to why that isn't the path taken.

joerg added a comment.Jan 30 2021, 5:32 AM

First of all, I find this patch to be nearly impossible to read. It seems to mix a lot of refactoring with a functional change, making it very hard to focus on the core.

The main difference to the jump table logic is that the latter knows that all referenced addresses are within a function and therefore well contained. Nothing of the like seems to be found here. E.g. if this is supposed to address only unnamed pointers, it should be grouping them together and compute the offsets and then pick the optimal size. That's a transformation that can be beneficial for all modes for a not too large table. But it is hard to see what is going on here with all the seemingly unrelated changes.

gulfem updated this revision to Diff 324144.Tue, Feb 16, 6:02 PM
gulfem marked an inline comment as not done.

Implement it as a separate pass and apply it to user-defined lookup tables as well.

@lebedev.ri based on your feedback, I added it as a separate pass and added support for user-defined lookup tables.
Please let me know if you have any comments.

gulfem retitled this revision from [SimplifyCFG] Add relative switch lookup tables to [Passes] Add relative lookup table generator pass.Wed, Feb 17, 8:54 AM
gulfem edited the summary of this revision. (Show Details)
gulfem edited the summary of this revision. (Show Details)
hans added a comment.Thu, Feb 18, 5:14 AM

Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.

clang/test/CodeGen/switch-to-lookup-table.c
3

Buildbots may not have x86_64 as a registered target, so this test will break some buildbots.

I think the opt flags -relocation-model=pic and -relocation-model=static will do the right thing (see for example llvm/test/Transforms/SimplifyCFG/ARM/switch-to-lookup-table.ll

llvm/include/llvm/InitializePasses.h
321

In some places the pass is referred to as a generator and here it's a converter. I think converter is a better name, and it should be consistent.

llvm/include/llvm/Transforms/Utils/RelLookupTableGenerator.h
10 ↗(On Diff #324144)

For this kind of pass, I think it would be helpful to have a small example in the comment that shows what kind of IR it's looking for, and what it will be transformed to. (Either here or in the .cpp file.)

22 ↗(On Diff #324144)

No need to call it simple :)

33 ↗(On Diff #324144)

As clang-tidy points out, the comment doesn't match the actual macro name.

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
25 ↗(On Diff #324144)

Since the function below are only used within this file, they should be static or in an anonymous namespace. Since you're already using an anonymous namespace for the RelLookupTableConverterPass class, I'd suggest using that for these functions too.

28 ↗(On Diff #324144)

This explains the "what" of what the code does, but not the "why". Why should the transformation only run for these two targets?

34 ↗(On Diff #324144)

This also needs the "why".

39 ↗(On Diff #324144)

Again, this needs the "why".
And perhaps it would make sense to put this check first.

58 ↗(On Diff #324144)

The comment doesn't match the code exactly I think, since further down you also allow GetElementPtr expressions. Maybe the comment could be clearer.

78 ↗(On Diff #324144)

Should it be hasOneUse() or hasOneUser()?

Also maybe this check could some before the for-loop, since it's cheaper and may return early.

There probably also needs to be a check that the global has local linkage, otherwise it could be referenced outside the module.

88 ↗(On Diff #324144)

if you know that the initializer is a ConstantArray, you can use cast<> instead of dyn_cast<> here

91 ↗(On Diff #324144)

The int32 type here has been mentioned in reviews before.

I think at least, the code needs to have a good motivation for why 32 bits is enough.

119 ↗(On Diff #324144)

The "can" part of the name seems misleading, since this doesn't return true or false, but actually tries to build the lookup table (and possibly returns null if it can't). I'd just drop "can" from the name.

Oh, but there is already a generateRelLookupTable() function.. well, maybe there is a better name for one of them.

122 ↗(On Diff #324144)

This code which checks that the user of the table is a GEP, followed by a load, etc. feels like a continuation of the checks in shouldGenerateRelLookupTableForGlobal(). I would suggest just moving those checks into this function.

125 ↗(On Diff #324144)

getNextNode() doesn't seem right. It gets the next instruction, but that is not necessarily the same as the instruction using the GEP.

Also, the code probably needs to check that the GEP only has one use

129 ↗(On Diff #324144)

(nit: commonly, llvm code uses an M variable for the module)

166 ↗(On Diff #324144)

I'd suggest moving this to the top of the function, before even declaring Changed.

169 ↗(On Diff #324144)

This would be simpler as

for (GlobalVariable *GlobalVar : M.globals()) {
llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll
3

(you could skip the -mtriple argument here since there is a "target triple" line in the IR below)

It looks like you have everything setup for running on the new PM, but it doesn't look like the pass is added anywhere in the new PM pipeline. Unfortunately, I don't *think* there's anything in the new PM that acts as a "default pipeline that gets added to all other pipelines" similar to PassManagerBuilder::populateModulePassManager, so we'll need to manually include this somewhere in PassBuilder::buildO0DefaultPipeline and PassBuilder::buildPerModuleDefaultPipeline.

Depending on if we also want to support this in [thin]LTO, we may need to add this to more pipelines.

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
46 ↗(On Diff #324144)

We should also check if the switch table itself is dso_local since the right relocation won't be generated if it isn't.

51 ↗(On Diff #324144)

cast since we checked before this is a ConstantArray. Alternatively, what you could have is:

if (!GlobalVar.hasInitializer())
  return false;

ConstantArray *Array = dyn_cast<ConstantArray>(GlobalVar.getInitializer());
if (!Array || !Array->getType()->getElementType()->isPointerTy())
  return false;
61 ↗(On Diff #324144)

Rather than checking the linkage explicitly, you can use isImplicitDSOLocal which also has some visibility checks.

GlobalVarOp->isDSOLocal() || GlobalVarOp->isImplicitDSOLocal()
64–70 ↗(On Diff #324144)

It seems that with this, we're limiting this to only arrays with GEPs with globals as the base, but I think this will return false if the array element is just a dso_local global. We definitely should still be taking into account GEPs though.

I'm thinking IsConstantOffsetFromGlobal might be more useful here since it already contains a bunch of logic for handling ConstantExpr GEPs, then you can check if the global found by that is dso_local.

76–77 ↗(On Diff #324144)

What's the reason for why the number of users matters?

92–95 ↗(On Diff #324144)

I think the visibility and linkage should be set the same as those of the original lookup table.

I think to avoid many changes to the original lookup table and only focus on the new layout, we should also propagate any properties of the original table to the new relative table. That is, visibility, linkage, attributes, unnamed_addr, etc. should be copied from the original.

(For copying attributes, you can use copyAttributesFrom.)

llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll
32

We should also have cases that cover other linkages/visibilities:

  • If the table elements are extern dso_local or extern hidden, we should still expect a relative lookup table
  • If the switch table is not dso_local/hidden, we shouldn't expect a relative lookup table
gulfem updated this revision to Diff 326265.Wed, Feb 24, 6:45 PM
  • Rename the pass to RelLookupTableConverter to be consistent
  • Addressed reviewers' feedback
  • Added tests for user-defined lookup tables and hidden visibility
gulfem retitled this revision from [Passes] Add relative lookup table generator pass to [Passes] Add relative lookup table converter pass.Wed, Feb 24, 6:47 PM
gulfem marked 17 inline comments as done.Wed, Feb 24, 6:54 PM
gulfem added inline comments.
clang/test/CodeGen/switch-to-lookup-table.c
3

I added x86-registered-target to ensure that it only runs on buildbots that have x86_64 as a registered target

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
39 ↗(On Diff #324144)

I added the reason, please let me know if that's not clear enough.

169 ↗(On Diff #324144)

Please keep that in mind that I'm deleting a global variable while I'm iterating over global variables.
I don't want to have invalidated iterator, and this is why I did not use a simple for loop.

Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.

Thank you very much for all of your wonderful constructive feedback!
I learned much more about LLVM and IR internals.
Appreciate all your help!

It looks like you have everything setup for running on the new PM, but it doesn't look like the pass is added anywhere in the new PM pipeline.

Thank you very much for pointing that out @leonardchan !
I added this pass into both pass managers now.

gulfem added inline comments.Wed, Feb 24, 7:09 PM
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
76–77 ↗(On Diff #324144)

We generate one lookup table per switch statement whenever possible.
I think there is only one use of that lookup table which is the`GetElementPtr` instruction.
That is why I checked for one use.
Do you see any issue with that?

gulfem updated this revision to Diff 328354.Thu, Mar 4, 6:29 PM

Use TTI hook for target machine checks