This is an archive of the discontinued LLVM Phabricator instance.

[Passes] Add relative lookup table converter pass
ClosedPublic

Authored by gulfem on Jan 8 2021, 6:28 PM.

Details

Summary

Lookup tables generate non PIC-friendly code, which requires dynamic relocation as described in:
https://bugs.llvm.org/show_bug.cgi?id=45244
This patch adds a pass that converts lookup tables to relative lookup tables to make them PIC-friendly.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
leonardchan added inline comments.Jan 14 2021, 11:07 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5740–5747

Nit: Since this seems to be the exact same as the start of the ArrayKind case, it might be cleaner to move this into a lambda or function.

gulfem updated this revision to Diff 316824.Jan 14 2021, 6:28 PM
  1. Add dso_local check
  2. Remove -relative-switch-lookup-table flag, and enable it by default
  3. Use shift instead of mul
  4. Remove /ARM test and merge it with /X86 test
gulfem marked 4 inline comments as done.Jan 14 2021, 6:30 PM
gulfem marked 2 inline comments as done.Jan 14 2021, 6:36 PM
gulfem added inline comments.
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll
1

Any reason to have tests both under ARM/ and X86/ for this?

There are two existing switch-to-lookup tests one under ARM/ and one under /X86 (I don't know the original reason).
/ARM test is much simpler. I now put all the relative lookup tests under /X86.

Also I'd like to see a test which shows that the transformation happens for PIC code but not for non-PIC.

Could you please clarify that?
Do you want to see a test that checks the code for non-PIC case when -fno-PIC is enabled?

hans added inline comments.Jan 15 2021, 1:32 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5469

Since this is a helper function used internally by the class, it should be private.

5473

Actually, this should probably be private too.

5690

It's a little annoying that we have to pass in a Builder here (and also to the SwitchLookupTable class) just to create integer types. I don't think a builder is strictly necessary for that.

But more importantly, I'm not sure hard-coding this to 64 bits for the pointer and 32 bits for the offset is correct. Shouldn't this depend on the target?

5740–5747

One alternative would also be to bake the cases together:

case ArrayKind:
case RelOffsetArrayKind: {
... common stuff ...
if (Kind == RelOffsetArrayKind) {
  .. special stuff ..
}
llvm/test/Transforms/SimplifyCFG/ARM/switch-to-relative-lookup-table.ll
1

Do you want to see a test that checks the code for non-PIC case when -fno-PIC is enabled?

Yes, exactly. This seems important since we don't want the transformation to run for the no-PIC case (and currently I don't see anything in the code which ensures that.)

It might help to put the relative lookup table test in a separate file, and use two invocations once with PIC and one without, and check the expected results with different FileCheck prefixes.

joerg added a subscriber: joerg.Jan 15 2021, 5:25 AM

Some targets already do this. Please check that you don't create regressions, especially on PowerPC.

hans added a comment.Jan 15 2021, 5:57 AM

Some targets already do this. Please check that you don't create regressions, especially on PowerPC.

Joerg: any pointers to where ppc does this and what it does exactly?

joerg added a comment.Jan 15 2021, 6:06 AM

It might be specific to the jump table case, but it should be instructional on how to do it. One important point is that it avoids inter-section relocations, which are a problem at least on MIPS.

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

On all targets I'm aware of, the only kind of expression involving two symbols like this that can be expressed at all in assembly / relocations is a PC-relative case where one of the symbols is in the same section of the same TU as the data containing the reference (and is not dynamically interposable, i.e. has internal linkage or is dso_local). I don't think mips or powerpc is at all unlike any other platform in this regard. Indeed this applies to targets where the pointer size is not 64 bits, too. But as far as I'm aware, in all cases there is a 32-bit signed offset option for PC-relative references and in many that's the only option. So it probably is reasonable to hard-code the table entry offset size as 32 bits, though the pointer size obviously should be whatever is the size of pointers on the particular target.

It's possible the 32-bit offsets won't work when in medium or large code models on 64-bit machines. On some machines that have code models where 32 bit offsets are not always sufficient, there are 64-bit PC-relative relocs you can use so the same method but using 64-bit table entries is feasible. But I'm not sure that's the case of all 64-bit machines (it is true of aarch64 and x86-64 when using ELF, but I don't know others off hand). So it might be necessary to parameterize either enabling the PIC-friendly flavor of the optimization, or the offset size to use in table entries, or both, based on the target and the code model setting as interpreted for each particular target.

gulfem updated this revision to Diff 318372.Jan 21 2021, 5:59 PM
  1. Apply relative lookup table generation in fPIC mode
  2. Add a test case for single value case
  3. Remove hard-coding 64 bit pointers
  4. Improve implementation and test coverage
Herald added a project: Restricted Project. · View Herald TranscriptJan 21 2021, 5:59 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
gulfem marked 4 inline comments as done.Jan 21 2021, 6:03 PM
gulfem marked 2 inline comments as done.Jan 21 2021, 6:49 PM

Almost there! Just a few more comments on my end.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5719

I think we should also return false if it doesn't turn out to be a GlobalValue here.

llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll
202–204

It looks like this might not be testing what the comment says it should. It looks like in the phi statements below that each of the cases have different values. I think maybe you'll want something like:

%str1.0 = phi i8* [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.default ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb2 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %sw.bb1 ], [ getelementptr inbounds ([5 x i8], [5 x i8]* @.str, i64 0, i64 0), %entry ]
ret i8* %str1.0  ; instead of `ret void` to ensure the `%str1.0` isn't optimized out

Then ensure this just lowers directly to a GEP for @.str and no lookup table is generated.

hans added inline comments.Jan 25 2021, 2:31 AM
clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

Clang codegen tests are not normally used to test LLVM optimizations. I think the tests for this should all live in LLVM, not Clang.

(Also aarch64 is not guaranteed to be included as a target in the LLVM build, so this test would not necessarily run.)

6 ↗(On Diff #318372)

This table and the one below are very hard to read like this. Could you split it into multiple lines using FNOPIC-SAME?

36 ↗(On Diff #318372)

I think the minimum number of cases for the switch-to-lookup table transformation is only 4 or 5. To make the test easier to read, I'd suggest using the minimum number of cases in the switch.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5683

This comment seems unnecessary, at this point we know we're generating the relative table.

5684

I do worry about hard-coding this to 32 bits. As someone pointed out, it would not necessary hold in all code models for x86.

Similarly to the PIC check in ShouldBuildRelLookupTable(), is there some check we could do to make sure 32 bits is appropriate?

5687–5688

The Builder points to a specific insertion point in a basic block for the lookup, so it knows the Module and adding the Module parameter is redundant.

5720

I don't remember, will isDSOLocal() return true also if it's a private or internal symbol? Otherwise maybe this should check isLocalLinkage() also.

llvm/test/Transforms/SimplifyCFG/X86/switch_to_relative_lookup_table.ll
8

Same comment as I made for the test under clang/ above: I think fewer switch cases are probably enough to test this, and would make it easier to read. Also splitting the lookup tables over multiple lines would help too.

gulfem updated this revision to Diff 320025.Jan 28 2021, 6:52 PM
  1. Simplified test cases and increased readibility of the tables
  2. Added x86_64 and aarch64 check and tiny or small code modes check to ensure 32 offsets
  3. Modified single value test case
gulfem marked 9 inline comments as done.Jan 28 2021, 7:03 PM
gulfem added inline comments.
clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

I'm not able to use -fPIC and -fno-PIC options in the opt tool.
I am setting the PIC Level flag to enable -fPIC in `opt.
I thought that testing -fPIC and -fno-PIC in the same file seems easier and more readable in CodeGen tests.
Please let me know if you have a good suggestion how to do that with opt.

I changed the target to x86_64-linux in this test.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5684

I added x86_64 and aarch64 target check and tiny or small code mode check to ensure 32 offsets.
Please let me know if you have any other concerns about that.

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

lebedev.ri added inline comments.Jan 29 2021, 1:38 AM
llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5690–5703

This should be some TLI/TTI hook.

gulfem marked an inline comment as done.Jan 29 2021, 4:15 PM

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

@mcgrathr gave some explanation to that:

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

I also want to add to that:
This task specifically tries make switch-to-lookup tables PIC-friendly, but it does not necessarily try to make all the jump tables PIC-friendly.
There is also another work going on to generate PIC-friendly vtables.
Therefore, converting non-relative lookup tables to relative tables elsewhere can be explored as a separate task.

llvm/lib/Transforms/Utils/SimplifyCFG.cpp
5690–5703

This should be some TLI/TTI hook.

Could you please elaborate on that?
Are you talking about getting the PIC level?

Can you please add an explanation to the patch's description as to why
we don't want to instead convert non-relative/relative LUT's elsewhere,
please.

@mcgrathr gave some explanation to that:

Many backends generate PIC-friendly jump tables. This is about generating IR initializers that translate to the same kind of backend assembly expressions as those backends use for their jump tables.

I also want to add to that:
This task specifically tries make switch-to-lookup tables PIC-friendly, but it does not necessarily try to make all the jump tables PIC-friendly.
There is also another work going on to generate PIC-friendly vtables.
Therefore, converting non-relative lookup tables to relative tables elsewhere can be explored as a separate task.

Personally, i read that as non-answer,
because this just reiterates that it can be done elsewhere,
and doesn't answer my question as to why that isn't the path taken.

joerg added a comment.Jan 30 2021, 5:32 AM

First of all, I find this patch to be nearly impossible to read. It seems to mix a lot of refactoring with a functional change, making it very hard to focus on the core.

The main difference to the jump table logic is that the latter knows that all referenced addresses are within a function and therefore well contained. Nothing of the like seems to be found here. E.g. if this is supposed to address only unnamed pointers, it should be grouping them together and compute the offsets and then pick the optimal size. That's a transformation that can be beneficial for all modes for a not too large table. But it is hard to see what is going on here with all the seemingly unrelated changes.

gulfem updated this revision to Diff 324144.Feb 16 2021, 6:02 PM
gulfem marked an inline comment as not done.

Implement it as a separate pass and apply it to user-defined lookup tables as well.

@lebedev.ri based on your feedback, I added it as a separate pass and added support for user-defined lookup tables.
Please let me know if you have any comments.

gulfem retitled this revision from [SimplifyCFG] Add relative switch lookup tables to [Passes] Add relative lookup table generator pass.Feb 17 2021, 8:54 AM
gulfem edited the summary of this revision. (Show Details)
gulfem edited the summary of this revision. (Show Details)
hans added a comment.Feb 18 2021, 5:14 AM

Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.

clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

Buildbots may not have x86_64 as a registered target, so this test will break some buildbots.

I think the opt flags -relocation-model=pic and -relocation-model=static will do the right thing (see for example llvm/test/Transforms/SimplifyCFG/ARM/switch-to-lookup-table.ll

llvm/include/llvm/InitializePasses.h
321 ↗(On Diff #324144)

In some places the pass is referred to as a generator and here it's a converter. I think converter is a better name, and it should be consistent.

llvm/include/llvm/Transforms/Utils/RelLookupTableGenerator.h
10 ↗(On Diff #324144)

For this kind of pass, I think it would be helpful to have a small example in the comment that shows what kind of IR it's looking for, and what it will be transformed to. (Either here or in the .cpp file.)

22 ↗(On Diff #324144)

No need to call it simple :)

33 ↗(On Diff #324144)

As clang-tidy points out, the comment doesn't match the actual macro name.

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
25 ↗(On Diff #324144)

Since the function below are only used within this file, they should be static or in an anonymous namespace. Since you're already using an anonymous namespace for the RelLookupTableConverterPass class, I'd suggest using that for these functions too.

28 ↗(On Diff #324144)

This explains the "what" of what the code does, but not the "why". Why should the transformation only run for these two targets?

34 ↗(On Diff #324144)

This also needs the "why".

39 ↗(On Diff #324144)

Again, this needs the "why".
And perhaps it would make sense to put this check first.

58 ↗(On Diff #324144)

The comment doesn't match the code exactly I think, since further down you also allow GetElementPtr expressions. Maybe the comment could be clearer.

78 ↗(On Diff #324144)

Should it be hasOneUse() or hasOneUser()?

Also maybe this check could some before the for-loop, since it's cheaper and may return early.

There probably also needs to be a check that the global has local linkage, otherwise it could be referenced outside the module.

88 ↗(On Diff #324144)

if you know that the initializer is a ConstantArray, you can use cast<> instead of dyn_cast<> here

91 ↗(On Diff #324144)

The int32 type here has been mentioned in reviews before.

I think at least, the code needs to have a good motivation for why 32 bits is enough.

119 ↗(On Diff #324144)

The "can" part of the name seems misleading, since this doesn't return true or false, but actually tries to build the lookup table (and possibly returns null if it can't). I'd just drop "can" from the name.

Oh, but there is already a generateRelLookupTable() function.. well, maybe there is a better name for one of them.

122 ↗(On Diff #324144)

This code which checks that the user of the table is a GEP, followed by a load, etc. feels like a continuation of the checks in shouldGenerateRelLookupTableForGlobal(). I would suggest just moving those checks into this function.

125 ↗(On Diff #324144)

getNextNode() doesn't seem right. It gets the next instruction, but that is not necessarily the same as the instruction using the GEP.

Also, the code probably needs to check that the GEP only has one use

129 ↗(On Diff #324144)

(nit: commonly, llvm code uses an M variable for the module)

166 ↗(On Diff #324144)

I'd suggest moving this to the top of the function, before even declaring Changed.

169 ↗(On Diff #324144)

This would be simpler as

for (GlobalVariable *GlobalVar : M.globals()) {
llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll
2 ↗(On Diff #324144)

(you could skip the -mtriple argument here since there is a "target triple" line in the IR below)

It looks like you have everything setup for running on the new PM, but it doesn't look like the pass is added anywhere in the new PM pipeline. Unfortunately, I don't *think* there's anything in the new PM that acts as a "default pipeline that gets added to all other pipelines" similar to PassManagerBuilder::populateModulePassManager, so we'll need to manually include this somewhere in PassBuilder::buildO0DefaultPipeline and PassBuilder::buildPerModuleDefaultPipeline.

Depending on if we also want to support this in [thin]LTO, we may need to add this to more pipelines.

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
46 ↗(On Diff #324144)

We should also check if the switch table itself is dso_local since the right relocation won't be generated if it isn't.

51 ↗(On Diff #324144)

cast since we checked before this is a ConstantArray. Alternatively, what you could have is:

if (!GlobalVar.hasInitializer())
  return false;

ConstantArray *Array = dyn_cast<ConstantArray>(GlobalVar.getInitializer());
if (!Array || !Array->getType()->getElementType()->isPointerTy())
  return false;
61 ↗(On Diff #324144)

Rather than checking the linkage explicitly, you can use isImplicitDSOLocal which also has some visibility checks.

GlobalVarOp->isDSOLocal() || GlobalVarOp->isImplicitDSOLocal()
64–70 ↗(On Diff #324144)

It seems that with this, we're limiting this to only arrays with GEPs with globals as the base, but I think this will return false if the array element is just a dso_local global. We definitely should still be taking into account GEPs though.

I'm thinking IsConstantOffsetFromGlobal might be more useful here since it already contains a bunch of logic for handling ConstantExpr GEPs, then you can check if the global found by that is dso_local.

76–77 ↗(On Diff #324144)

What's the reason for why the number of users matters?

92–95 ↗(On Diff #324144)

I think the visibility and linkage should be set the same as those of the original lookup table.

I think to avoid many changes to the original lookup table and only focus on the new layout, we should also propagate any properties of the original table to the new relative table. That is, visibility, linkage, attributes, unnamed_addr, etc. should be copied from the original.

(For copying attributes, you can use copyAttributesFrom.)

llvm/test/Transforms/SimplifyCFG/X86/relative_lookup_table.ll
31 ↗(On Diff #324144)

We should also have cases that cover other linkages/visibilities:

  • If the table elements are extern dso_local or extern hidden, we should still expect a relative lookup table
  • If the switch table is not dso_local/hidden, we shouldn't expect a relative lookup table
gulfem updated this revision to Diff 326265.Feb 24 2021, 6:45 PM
  • Rename the pass to RelLookupTableConverter to be consistent
  • Addressed reviewers' feedback
  • Added tests for user-defined lookup tables and hidden visibility
gulfem retitled this revision from [Passes] Add relative lookup table generator pass to [Passes] Add relative lookup table converter pass.Feb 24 2021, 6:47 PM
gulfem marked 17 inline comments as done.Feb 24 2021, 6:54 PM
gulfem added inline comments.
clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

I added x86-registered-target to ensure that it only runs on buildbots that have x86_64 as a registered target

llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
39 ↗(On Diff #324144)

I added the reason, please let me know if that's not clear enough.

169 ↗(On Diff #324144)

Please keep that in mind that I'm deleting a global variable while I'm iterating over global variables.
I don't want to have invalidated iterator, and this is why I did not use a simple for loop.

Thanks for pushing this forward! I think this will be a nice transformation once all the details are worked out.

Thank you very much for all of your wonderful constructive feedback!
I learned much more about LLVM and IR internals.
Appreciate all your help!

It looks like you have everything setup for running on the new PM, but it doesn't look like the pass is added anywhere in the new PM pipeline.

Thank you very much for pointing that out @leonardchan !
I added this pass into both pass managers now.

gulfem added inline comments.Feb 24 2021, 7:09 PM
llvm/lib/Transforms/Utils/RelLookupTableGenerator.cpp
76–77 ↗(On Diff #324144)

We generate one lookup table per switch statement whenever possible.
I think there is only one use of that lookup table which is the`GetElementPtr` instruction.
That is why I checked for one use.
Do you see any issue with that?

gulfem updated this revision to Diff 328354.Mar 4 2021, 6:29 PM
gulfem added a comment.Mar 4 2021, 6:29 PM

Use TTI hook for target machine checks

One thing that just occurred to me: do we also perhaps want to hide this behind a flag? Right now it's being added to various default optimization pipelines, so some users might be surprised if they suddenly see their lookup tables change (either compiler or user generated). Do we want to make this something more opt-in, or is the layout itself not necessarily a concern to most users?

clang/test/CodeGen/switch-to-lookup-table.c
2 ↗(On Diff #318372)

+1 on this. Unless this functionally changes something in the clang codebase, this test shouldn't be here. As hans pointed out, setting the -relocation-model should be enough.

llvm/include/llvm/CodeGen/BasicTTIImpl.h
384 ↗(On Diff #328354)

Sorry, I think you might have explained this offline, but what was the reason for needing this in TTI? I would've though this information could be found in the Module (PIC/no PIC, 64-bit or not, code model). If it turns out all of this is available in Module, then we could greatly simplify some logic here by just checking this at the start of the pass run.

If TTI is needed, then perhaps it may be better to just inline all these checks in convertToRelativeLookupTables since this is the only place this is called. I think we would only want to keep this here as a virtual method if we plan to have multiple TTI-impls overriding this.

llvm/lib/Transforms/IPO/PassManagerBuilder.cpp
882

Should this be added at the end of the pipeline? It could be possible that other passes insert lookup tables but they may be untouched by this if this pass runs before them.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
30–31 ↗(On Diff #328354)

Nit: //

40 ↗(On Diff #328354)

isa

53–70 ↗(On Diff #328354)

I think I mentioned this in a previous round of comments, but what you probably want here is just a way to determine if the operand is either a global or some constant offset from global. Right now it looks this this will ignore other constant expressions like bitcasts or ptrtoints which we also want to catch. I think it might be better to use IsConstantOffsetFromGlobal which already handles these cases.

79 ↗(On Diff #328354)

cast

218 ↗(On Diff #328354)

Should we be returning the value returned by this?

gulfem updated this revision to Diff 330405.Mar 12 2021, 5:22 PM
  1. Used IsConstantOffsetFromGlobal() function
  2. Added this pass to the end of legacy pass manager pipeline
  3. Moved all the tests under a new test directory
gulfem marked 7 inline comments as done.Mar 12 2021, 6:12 PM
gulfem added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
384 ↗(On Diff #328354)

Code model or PIC/noPIC is only set in the module if the user explicitly specifies them.
TTI hook is necessary to access target machine information like the default code model.
TTI is basically the interface to communicate between IR transformations and codegen.

I think the checks cannot be moved into the pass because it uses the TargetMachine which is only available/visible in the TTI implementation itself.
That's why I added a function into TTI to do target machine specific checks.
Similar approach is used during lookup table generation (shouldBuildLookupTables) to check codegen info.

leonardchan accepted this revision.Mar 15 2021, 11:56 AM

LGTM pending a few more comments. Should also give some time to let others respond if they have feedback.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
41–42 ↗(On Diff #330405)

Not needed here since you have the isa below.

llvm/test/Transforms/RelLookupTableConverter/relative_lookup_table.ll
2–3 ↗(On Diff #330405)

We should also check some other RUNs to check that this isn't run on cases that return false in shouldBuildRelLookupTables: non-PIC, non-64-bit, other code model sizes, etc.

llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll
39 ↗(On Diff #330405)

It looks like this test case isn't much different from string_table in relative_lookup_table.ll? If so, then this file could be removed.

llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64 ↗(On Diff #330405)

Good that you added this, but I think Nico has a bot that automatically updates these BUILD.gn files so manually updating them may not be necessary.

This revision is now accepted and ready to land.Mar 15 2021, 11:56 AM
lebedev.ri added inline comments.Mar 15 2021, 12:04 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391 ↗(On Diff #330405)
  1. But all tests are using x86_64 triple?
  2. This is somewhat backwards. if the target wants to disable this, it will need to override this function with return false;.
hans added a comment.Mar 16 2021, 10:56 AM

Sorry for being unresponsive for a while, I got distracted by various bugs.

I skimmed this and it's looking great. Just added a few nit picks.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
32 ↗(On Diff #330405)

It would be better if the comment said why. I suppose the reason is we need to be sure there aren't other uses of the table, because then it can't be replaced.

But it would be cool if a future version of this patch could handle when there are multiple loads from the table which can all be replaced -- for example this could happen if a function which uses a lookup table gets inlined into multiple places.

47 ↗(On Diff #330405)

It would be good with a "why" here too.

gulfem updated this revision to Diff 331139.Mar 16 2021, 6:01 PM
  1. Add no_relative_lookup_table.ll test case that checks the cases where relative lookup table should not be generated
  2. Add comments about dso_local check and single use check
gulfem marked 4 inline comments as done.Mar 16 2021, 6:10 PM
gulfem added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
32 ↗(On Diff #330405)

I actually ran into the exact case that you described during testing, where a function that uses a switch gets inlined into multiple call sites :)
This is only to simplify the analysis, and I now added a TODO section to explore that later.

llvm/test/Transforms/RelLookupTableConverter/switch_relative_lookup_table.ll
39 ↗(On Diff #330405)

I renamed this test case to no_relative_lookup_table.ll that checks the cases where relative lookup table should not be generated like in non-pic mode, medium or large code models, and 32 bit architectures, etc.

gulfem marked 2 inline comments as done.Mar 16 2021, 6:20 PM
gulfem added inline comments.
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391 ↗(On Diff #330405)
  1. Although I used x86_64 triple, this optimization can be applied to other 64-bit architectures too, because it not target dependent except isArch64Bit and getCodeModel check.
  2. Is there a target that you have in mind that we need to disable this optimization?

I thought that it makes sense to enable this optimization by default on all the targets that can support it.
In case targets want to disable it, they can override it as you said.
How can we improve the implementation?
If you have suggestions, I'm happy to incorporate that.

@lebedev.ri do you have further comments?
If not, I would like to submit this patch.

lebedev.ri added inline comments.Mar 19 2021, 12:01 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391 ↗(On Diff #330405)

I'm sorry, i do not understand.
Why does !TM.getTargetTriple().isArch64Bit() check exist?
To me it reads as "if we aren't compiling for AArch64, don't build rel lookup tables".
Am i misreading this?

gulfem added inline comments.Mar 19 2021, 2:19 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391 ↗(On Diff #330405)

isArch64Bit checks whether we have a 64-bit architecture, right?
I don't think it specifically checks for AArch64, and it can cover other 64-bit architectures like x86_64 as well.

lebedev.ri accepted this revision.Mar 19 2021, 2:35 PM

Thank you!

llvm/include/llvm/CodeGen/BasicTTIImpl.h
390–391 ↗(On Diff #330405)

isArch64Bit checks whether we have a 64-bit architecture, right?

D'oh. I really did read it as A*A*rch64 :/ Sorry.

llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
172–173 ↗(On Diff #331139)

make_early_inc_range()?

200 ↗(On Diff #331139)

I would think this should be

PreservedAnalyses PA;
PA.preserveSet<CFGAnalyses>();
return PA;

since this doesn't touch CFG at all.
I think this should get rid of redundant Running analysis: TargetIRAnalysis.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Mar 22 2021, 3:36 PM
thakis added inline comments.
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64 ↗(On Diff #330405)

Please don't touch gn files unless you use them. Simple file additions in cmake files are synced automatically http://github.com/llvmgnsyncbot

You forgot to add the trailing comma and now I had to fix it up manually instead of doing nothing :P

gulfem added inline comments.Mar 22 2021, 3:50 PM
llvm/utils/gn/secondary/llvm/lib/Transforms/Utils/BUILD.gn
64 ↗(On Diff #330405)

Sorry about that Niko. I can fix it, so you don't need to do anything.
Leo actually pointed that out, but I thought that manually changing it won't do any harm.
Apparently it did though!

arichardson added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
91 ↗(On Diff #332445)

This should use the same address space as the original global.

It looks like this is breaking the Windows/ARM(64) target - it doesn't produce the right relative relocations for symbol differences. It can be reproduced with a testcase like this:

$ cat test.s
        .text
func1:
        ret
func2:
        ret

        .section        .rdata,"dr"
        .p2align        2
table:
        .long   func1-table
        .long   func2-table
$ clang -target aarch64-windows -c -o - test.s | llvm-objdump -r -s -

<stdin>:        file format coff-arm64

RELOCATION RECORDS FOR [.rdata]:
OFFSET           TYPE                     VALUE
0000000000000000 IMAGE_REL_ARM64_ADDR32   func1
0000000000000004 IMAGE_REL_ARM64_ADDR32   func2

Contents of section .text:
 0000 c0035fd6 c0035fd6                    .._..._.
Contents of section .rdata: 
 0000 00000000 04000000                    ........

Those relocations would need to be IMAGE_REL_ARM64_REL32. It looks like the arm/windows target has got the same issue as well.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

It looks like this is breaking the Windows/ARM(64) target - it doesn't produce the right relative relocations for symbol differences. It can be reproduced with a testcase like this:

[...]

Those relocations would need to be IMAGE_REL_ARM64_REL32. It looks like the arm/windows target has got the same issue as well.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

It turned out to not be all that hard to fix actually, see D99572 for such a fix. If I can get that landed soon, I think we might not need to act on this one.

rnk added subscribers: aeubanks, rnk.Mar 31 2021, 1:16 PM
rnk added inline comments.
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323 ↗(On Diff #332445)

Putting a ModulePass in the middle of the CodeGen pass pipeline creates a "pass barrier": now instead of applying every pass to each function in turn, the old pass manager will stop, run this whole-module pass, and then run subseqeunt passes in the next function pass manager on each function in turn. This isn't ideal. @aeubanks, can you follow-up to make sure this is addressed?

We had the same issues with the SymbolRewriter pass, which if you grep for "Rewrite Symbols" you can see has the same issue. I remember writing a patch to fix it, but I guess I never landed it.

aeubanks added inline comments.Mar 31 2021, 1:52 PM
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323 ↗(On Diff #332445)

I see "Rewrite Symbols" in the codegen pipeline and yeah it's splitting the function pass manager.

For this patch, can we just not add the pass to the legacy PM pipeline? It's deprecated and the new PM is already the default for the optimization pipeline.

aeubanks added inline comments.Mar 31 2021, 11:31 PM
llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323 ↗(On Diff #332445)

(https://reviews.llvm.org/D99707 for anybody interested)

gulfem added a comment.Apr 1 2021, 5:05 PM

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

I will disable the pass for those targets for now.
When the issue is resolved, I would like to enable it for those targets as well.

llvm/test/Other/opt-O3-pipeline-enable-matrix.ll
322–323 ↗(On Diff #332445)

For this patch, can we just not add the pass to the legacy PM pipeline? It's deprecated and the new PM is already the default for the optimization pipeline.

@rnk @aeubanks
If it causes issues, I'm ok to remove it from the legacy PM pipeline.
When I land this patch, I'll only add it to new PM.

Would you be ok with reverting this change until I can sort that out, or can we disable the pass for those targets until then?

I will disable the pass for those targets for now.
When the issue is resolved, I would like to enable it for those targets as well.

FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).

FYI, I pushed the fix for the aarch64-coff issue now (D99572, rGd5c5cf5ce8d921fc8c5e1b608c298a1ffa688d37) and pushed another commit to remove the code for disabling the pass on aarch64 (rG57b259a852a6383880f5d0875d848420bb3c2945).

Thank you @mstorsjo!

This patch breaks a two stage build with LTO:

$ git bisect log
# bad: [f0bc2782f281ca05221d2f1735bbaff6c4b81ebb] [TTI] NFC: Remove unused 'OptSize' parameter from shouldMaximizeVectorBandwidth
# good: [9829f5e6b1bca9b61efc629770d28bb9014dec45] [CVP] @llvm.[us]{min,max}() intrinsics handling
git bisect start 'f0bc2782f281ca05221d2f1735bbaff6c4b81ebb' '9829f5e6b1bca9b61efc629770d28bb9014dec45'
# bad: [1af35e77f4b8c3314dc20a10d579b52f22c75a00] [TTI] NFC: Change getVectorInstrCost to return InstructionCost
git bisect bad 1af35e77f4b8c3314dc20a10d579b52f22c75a00
# bad: [45f8946a759a780e6131256d6d206977b9c128ee] [CodeView] Fix the ARM64 CPUType enum
git bisect bad 45f8946a759a780e6131256d6d206977b9c128ee
# good: [0b439e4cc9dbb5c226121383b84d4f48ab669c55] [libc++] Split std::allocator out of <memory>
git bisect good 0b439e4cc9dbb5c226121383b84d4f48ab669c55
# good: [2eb98d89ac866e32cb56727174e4d1c1413479c8] [mlir][spirv] Allow bitwidth emulation on runtime arrays
git bisect good 2eb98d89ac866e32cb56727174e4d1c1413479c8
# bad: [80aa9b0f7b3ebe53220a398b2939610d8a49e24b] [PowerPC] stop reverse mem op generation for some cases.
git bisect bad 80aa9b0f7b3ebe53220a398b2939610d8a49e24b
# good: [a8ab1f98d22cf15f39dd1c2ce77675e628fceb31] [Evaluator] Look through invariant.group intrinsics
git bisect good a8ab1f98d22cf15f39dd1c2ce77675e628fceb31
# bad: [30f591c3869f3bbe6eca1249dcef1b8337312de6] [lldb] Disable TestLaunchProcessPosixSpawn.py with reproducers
git bisect bad 30f591c3869f3bbe6eca1249dcef1b8337312de6
# good: [6c4f2508e4278ac789230cb05f2bb56a8a7297dc] Revert "[lldb] [gdb-remote client] Refactor handling qSupported"
git bisect good 6c4f2508e4278ac789230cb05f2bb56a8a7297dc
# bad: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass
git bisect bad e96df3e531f506eea75da0f13d0f8aa9a267f975
# good: [1310a19af06262122a6e9e4f6fbbe9c39ebad76e] [mlir] Use MCJIT to fix integration tests
git bisect good 1310a19af06262122a6e9e4f6fbbe9c39ebad76e
# first bad commit: [e96df3e531f506eea75da0f13d0f8aa9a267f975] [Passes] Add relative lookup table converter pass

My reproduction steps, apologies if they are not reduced enough:

$ mkdir -p build/stage{1,2}

$ cmake \
   -B build/stage1 \
   -G Ninja \
   -DCMAKE_C_COMPILER=$(command -v clang) \
   -DCMAKE_CXX_COMPILER=$(command -v clang++) \
   -DLLVM_USE_LINKER=$(command -v ld.lld) \
   -DLLVM_ENABLE_PROJECTS="clang;lld" \
   -DLLVM_TARGETS_TO_BUILD=host \
   -DLLVM_CCACHE_BUILD=ON \
   -DCMAKE_BUILD_TYPE=Release \
   llvm
...

$ ninja -C build/stage1 all
...

$ cmake \
   -B build/stage2 \
   -G Ninja \
   -DCMAKE_AR=$PWD/build/stage1/bin/llvm-ar \
   -DCMAKE_C_COMPILER=$PWD/build/stage1/bin/clang \
   -DCLANG_TABLEGEN=$PWD/build/stage1/bin/clang-tblgen \
   -DCMAKE_CXX_COMPILER=$PWD/build/stage1/bin/clang++ \
   -DLLVM_USE_LINKER=$PWD/build/stage1/bin/ld.lld \
   -DLLVM_TABLEGEN=$PWD/build/stage1/bin/llvm-tblgen \
   -DCMAKE_RANLIB=$PWD/build/stage1/bin/llvm-ranlib \
   -DLLVM_ENABLE_PROJECTS=clang \
   -DLLVM_TARGETS_TO_BUILD=host \
   -DCMAKE_BUILD_TYPE=Release \
   -DLLVM_ENABLE_LTO=Full \
   llvm

$ ninja -C build/stage2 lib/libclang-cpp.so.13git
...
[2296/2296] Linking CXX shared library lib/libclang-cpp.so.13git
FAILED: lib/libclang-cpp.so.13git
...
ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0x8)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN5clang10TargetInfo21getTypeFormatModifierENS_23TransferrableTargetInfo7IntTypeE+0xC)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x8)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<60l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZNK5clang21analyze_format_string14LengthModifier8toStringEv+0x3C)

ld.lld: error: relocation R_X86_64_PC32 cannot be used against symbol llvm::detail::unit<std::ratio<3600l, 1l> >::value; recompile with -fPIC
>>> defined in lto.tmp
>>> referenced by ld-temp.o
>>>               lto.tmp:(.data.rel.ro..Lreltable._ZN4llvm15MCSymbolRefExpr18getVariantKindNameENS0_11VariantKindE+0xC0)
clang-13: error: linker command failed with exit code 1 (use -v to see invocation)
ninja: build stopped: subcommand failed.

This patch breaks a two stage build with LTO:

Thanks for the report @nathanchance, and I'm looking at it now.
Is this failure from one of the bots?

Thanks for the report @nathanchance, and I'm looking at it now.

Thanks!

Is this failure from one of the bots?

No, this was discovered by a user of my LLVM build script:

https://github.com/ClangBuiltLinux/tc-build

https://github.com/kdrag0n/proton-clang-build/runs/2371499466?check_suite_focus=true

gulfem added a comment.EditedApr 26 2021, 2:40 PM

@nathanchance do you prefer me to revert the patch first as it might take me a while to investigate it?
Btw, I was able to reproduce the issue by following your steps.

@nathanchance do you prefer me to revert the patch first as it might take me a while to investigate it?
Btw, I was able to reproduce the issue by following your steps.

I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).

gulfem added a comment.EditedApr 28 2021, 3:45 PM

I would say if it is going to take longer than the end of the week to fix the issue, it might be nice to have it reverted so that other people's builds continue to work (I know a few people who build with full LTO in a two stage configuration every week).

Definitely! If I cannot fix it in two days, I'll revert the change.
Update: @nathanchance I need to investigate this issue further, so I disabled this pass in LTO pre-link phase to fix the broken build (https://reviews.llvm.org/D94355).
Please let me know if there is still an issue in that build!

pcc added a subscriber: pcc.Jun 23 2021, 8:50 PM
pcc added inline comments.
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
73 ↗(On Diff #332445)

In the version of the patch that you committed, you have this check here:

// If operand is mutable, do not generate a relative lookup table.
auto *GlovalVarOp = dyn_cast<GlobalVariable>(GVOp);
if (!GlovalVarOp || !GlovalVarOp->isConstant())
  return false;
  1. Nit: Gloval -> Global
  2. Why is it important whether the referenced global is mutable? The pointer itself is constant.
gulfem added inline comments.Jun 25 2021, 4:16 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
73 ↗(On Diff #332445)
  1. That's a typo, and I will fix that.
  2. This optimization does not do a detailed analysis, and it is being conservative. In this case, GlobalVar points to the switch lookup table and GlobalVarOp points to the elements in the lookup table like strings. To make sure that relative arithmetic works, it just checks whether both GlobalVar and GlobalVarOp pointers are constants. Did you see an issue on that?

FWIW, this commit turned out to break the FreeBSD dns/bind916 port, see https://bugs.freebsd.org/259921.

The short story is that the bind9 code on and after this line: https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1525 gets changed from something like:

.Ltmp661:
        #DEBUG_VALUE: isc_log_doit:category_channels <- $r12
        .loc    3 0 58                          # log.c:0:58
        xorl    %eax, %eax
        testl   %r15d, %r15d
        setg    %al
        movl    %r15d, %ecx
        negl    %ecx
        movq    %rcx, -840(%rbp)                # 8-byte Spill
        leaq    8328(%r13), %rcx
        #DEBUG_VALUE: isc_log_doit:matched <- 0
        movq    %rcx, -808(%rbp)                # 8-byte Spill
.Ltmp662:
        .loc    3 1552 25 is_stmt 1             # log.c:1552:25

to using a relative lookup table:

.Ltmp661:
        #DEBUG_VALUE: isc_log_doit:category_channels <- $r12
        .loc    3 0 58                          # log.c:0:58
        xorl    %eax, %eax
        testl   %r15d, %r15d
        setg    %al
        movl    %r15d, %edx
        negl    %edx
        leaq    reltable.isc_log_doit(%rip), %rcx
        movq    %rdx, -848(%rbp)                # 8-byte Spill
        movslq  (%rcx,%rdx,4), %rdx
        addq    %rcx, %rdx
        movq    %rdx, -840(%rbp)                # 8-byte Spill
        leaq    8328(%r13), %rcx
        #DEBUG_VALUE: isc_log_doit:matched <- 0
        movq    %rcx, -808(%rbp)                # 8-byte Spill
.Ltmp662:
        .loc    3 1552 25 is_stmt 1             # log.c:1552:25

However, the value of %rcx at the movslq (%rcx,%rdx,4), %rdx statement becomes -2, so it attempts to access data before reltable.isc_log_doit. As that is in .rodata, this leads to a segfault.

The current working theory is that some code is hoisted out of the do-while loop starting at https://gitlab.isc.org/isc-projects/bind9/-/blob/main/lib/isc/log.c#L1531, in particular the [-level] accesses on lines 1613 and 1843:

                                snprintf(level_string, sizeof(level_string),
                                         "%s: ", log_level_strings[-level]);
...
                        } else {
                                syslog_level = syslog_map[-level];
                        }

but maybe these negative offsets confuse the lookup table converter?

jrtc27 added inline comments.Dec 10 2021, 2:46 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
132 ↗(On Diff #332445)

This line causes the bug seen in bind. In that case, the GEP has been hoisted, but the load has not. In general the GEP could be in a different basic block, or even in the same basic block with an instruction that may not return (intrinsic, real function call, well-defined language-level exception, etc). You can insert the reltable.shift where the GEP is, and that probably makes sense given it serves (part of) the same purpose, but you *must* insert the actual reltable.intrinsic where the original load is, unless you've gone to great lengths to prove it's safe not to (which seems best left to the usual culprits like LICM).

IR test cases: https://godbolt.org/z/YMdaMrobE (bind is characterised by the first of the two functions)

jrtc27 added inline comments.Dec 10 2021, 2:52 PM
llvm/include/llvm/CodeGen/BasicTTIImpl.h
395 ↗(On Diff #332445)

The meanings of code models isn't really portable across targets... e.g. RISC-V's medium (underlying LLVM name for -mcmodel=medany) assumes 32-bit PC-relative offsets, and thus using a 32-bit table-relative offset is safe

gulfem added inline comments.Dec 10 2021, 7:39 PM
llvm/lib/Transforms/Utils/RelLookupTableConverter.cpp
132 ↗(On Diff #332445)

@dim and @jrtc27 thank you for reporting it.
I see what's going wrong, and I uploaded a patch that fixes the issue by ensuring that the call to load.relative.intrinsic is inserted before the load, but not gep.
Please see https://reviews.llvm.org/D115571.