Page MenuHomePhabricator

[IPSCCP] don't propagate constant in section when caller/callee sections mismatch
Needs ReviewPublic

Authored by nickdesaulniers on Mar 4 2021, 11:57 AM.

Details

Summary

The Linux kernel uses section attributes on functions and data that's
used only during initialization to reclaim memory backing such code and
data post initialization.

The preprocessor defines init (for code) and initdata (for global
variables) expand to to:
attribute((section(".init"))) and
attribute((section(".init.data"))) respectively. See also
https://www.kernel.org/doc/html/latest/kernel-hacking/hacking.html?highlight=__initdata#init-exit-initdata.

So a commonly recurring pattern in the kernel is:

initdata int z;
static void callee (int x) { int y = x; }
init void caller (void) { callee(z); }

InterProcedural Sparse Conditional Constant Propagation (IPSCCP) can
turn the above into:

initdata int z;
static void callee (int x) { int y = z; }
init void caller (void) { callee(z); }

Note how callee directly references z directly now, rather than the
parameter x. Later, Dead Argument Elimination (deadargelim) may even
change the signature of callee removing dead arguments from its
signature, avoiding call setup in the caller for those arguments.

Now, consider what happens when callee is *not* inlined into caller.
Upon initialization, the kernel will reclaim z and caller, but not
callee. At best, we can consider this a memory leak. At worst, we've now
left behind a potential gadget for use after free.

With recent changes to enable NPM by default in clang-13
(https://reviews.llvm.org/D95380), the inlining heuristics have been
perturbed, which is leading to many cases in the Linux kernel where
callee was previously being inlined (avoiding all of the above
problems), and now is not.

This patch records the Value's used as parameters when caller and callee
are in explicitly different sections. Then, when IPSCCP goes to perform
transforms first checks if the GlobalValue that's the replacement comes
from an explicit section, and if the Value being replaced was from a
caller/callee section mismatch, and if so bails.

Care is taken to avoid not preventing the optimization for the general
case where section attributes are not used, or at least match. This
results in no change in binary size for the Linux kernel (x86_64
defconfig); there is a tradeoff in .text vs relocations of 24B
(insignificant, 3.64E-7%).

Alternative approaches considered:

  • Marking callee __init. We can't generally do this, as otherwise every

helper function wouldn't be callable from non __init code, lest it run
the risk of jumping to unmapped/remapped memory.

  • Inheriting __init on callee. While it appears that LLVM is creating a

specialized version of callee, it's technically IPSCCP and DeadArgElim
working together. I don't think adding section attributes to callers is
a general solution.

  • Use of attribute((always_inline)). This is tricky because it's

very common in the kernel for callee to be a static inline function
defined in a header. It's infeasible to add such function attribute to
every helper function, hurts compile time, and it's a relatively large
hammer to force the callee to be inlined into *every* caller. I'd argue
that IPSCCP in the case described is dangerous, regardless of inlining.

  • Adjusting InlineCost heuristics. We might be able to further discount

the cost for such specific cases described, or try to do somehow when
DeadArgElim has created such a specialized version of callee tightly
bound to caller. With the recent changes to inlining from NPM, I don't
want to perturb the InlineCost heuristic further right now.

Link: https://github.com/ClangBuiltLinux/linux/issues/1302
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>

Diff Detail

Unit TestsFailed

TimeTest
30 msx64 windows > LLVM.ExecutionEngine/JITLink/AArch64::MachO_arm64_ehframe.test
Script: -- : 'RUN: at line 1'; c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\llvm-jitlink.exe -noexec C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\ExecutionEngine\JITLink\AArch64/Inputs/MachO_arm64_ehframe.o
70 msx64 windows > LLVM.ExecutionEngine/JITLink/AArch64::MachO_arm64_relocations.s
Script: -- : 'RUN: at line 1'; rm -rf C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\AArch64\Output\MachO_arm64_relocations.s.tmp && mkdir -p C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\AArch64\Output\MachO_arm64_relocations.s.tmp
80 msx64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_skip_debug_sections.s
Script: -- : 'RUN: at line 2'; c:\ws\w16n2-1\llvm-project\premerge-checks\build\bin\llvm-mc.exe -triple=x86_64-pc-linux-gnu -filetype=obj -o C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\X86\Output\ELF_skip_debug_sections.s.tmp C:\ws\w16n2-1\llvm-project\premerge-checks\llvm\test\ExecutionEngine\JITLink\X86\ELF_skip_debug_sections.s
90 msx64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_weak_definitions.s
Script: -- : 'RUN: at line 1'; rm -rf C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\X86\Output\ELF_weak_definitions.s.tmp && mkdir -p C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\X86\Output\ELF_weak_definitions.s.tmp
80 msx64 windows > LLVM.ExecutionEngine/JITLink/X86::ELF_x86-64_common.s
Script: -- : 'RUN: at line 1'; rm -rf C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\X86\Output\ELF_x86-64_common.s.tmp && mkdir -p C:\ws\w16n2-1\llvm-project\premerge-checks\build\test\ExecutionEngine\JITLink\X86\Output\ELF_x86-64_common.s.tmp
View Full Test Results (19 Failed)

Event Timeline

nickdesaulniers requested review of this revision.Mar 4 2021, 11:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2021, 11:57 AM
lebedev.ri edited reviewers, added: efriedma; removed: eli.friedman.Mar 4 2021, 12:00 PM
nickdesaulniers edited the summary of this revision. (Show Details)Mar 4 2021, 12:01 PM
nickdesaulniers planned changes to this revision.Mar 4 2021, 12:03 PM

Sigh...it looks like this works for defconfigs, but not allmodconfigs (https://github.com/ClangBuiltLinux/linux/issues/1302). Let me see what's going on there and fix this up. In the meantime, RFC.

lebedev.ri added a subscriber: lebedev.ri.

Does langref's section actually provide such guarantees?

If we do this, we should create a helper somewhere that tells you if you can forward a constant, given an (Abstract)CallSite. For example, thread local constants can't be forwarded across callbacks.
So it's not only that we (might) have multiple things we need to filter but also multiple locations, Attributor needs this as well.

MaskRay added a comment.EditedMar 4 2021, 2:26 PM

void callee (int* x) { int y = *z; }

Should it be void callee (int* x) { int y = z; } ?

Later, Dead Argument Elimination (deadargelim) may even change the signature of callee removing dead arguments from its signature

Assuming void callee does not have the static keyword. deadargelim applies on local linkage functions.
LTO may internalize functions if the linker knows callee does not need to be exported. Yes, deadargelim can happen in that case.

https://github.com/ClangBuiltLinux/linux/issues/1301

WARNING: modpost: vmlinux.o(.text+0x68943): Section mismatch in reference from the function __nodes_weight() to the variable .init.data:numa_nodes_parsed

Is the warning about a non-.init.text function accesses .init.data data?
This feels to me that the kernel is asking too much.

If we rename .init.text to .text.hot. and .init.data to .data.hot., this rule would mean that a regular .text function cannot access .data.hot. data.
This does not sound right to me.
If the kernel wants to have an optimization barrier on the arguments, it should use some attributes if it does not want to use alwaysinline.

nickdesaulniers edited the summary of this revision. (Show Details)
  • solve recursive GEP, greatly simplify implementation, update LangRef, update description verbatim

Does langref's section actually provide such guarantees?

Added a note, PTAL.

If we do this, we should create a helper somewhere that tells you if you can forward a constant, given an (Abstract)CallSite. For example, thread local constants can't be forwarded across callbacks.
So it's not only that we (might) have multiple things we need to filter but also multiple locations, Attributor needs this as well.

Yep, that sounds feasible, and with my rewrite is fairly straightforward now that it's decoupled from SCCSolver. For the case you describe, does it also pertain to passing arguments or no? While my current implementation is tailed to the existing argument walk on the CallBase, that could simply be done for such a helper function you describe. Do you have a recommendation on a good location to place such a helper so that SCCSolver and Attributor could both refer to it?

void callee (int* x) { int y = *z; }

Should it be void callee (int* x) { int y = z; } ?

Thanks, fixed.

Later, Dead Argument Elimination (deadargelim) may even change the signature of callee removing dead arguments from its signature

Assuming void callee does not have the static keyword. deadargelim applies on local linkage functions.
LTO may internalize functions if the linker knows callee does not need to be exported. Yes, deadargelim can happen in that case.

Fixed the description to note that callee is static.

https://github.com/ClangBuiltLinux/linux/issues/1301

WARNING: modpost: vmlinux.o(.text+0x68943): Section mismatch in reference from the function __nodes_weight() to the variable .init.data:numa_nodes_parsed

Is the warning about a non-.init.text function accesses .init.data data?

Yes. That's a potential use after free, since .init.data data will be unmapped (and potentially remapped) after initialization, while the non-.init.text (ie. just `.text) function is not.

This feels to me that the kernel is asking too much.

Perhaps; do you have a recommendation on how best to pass references to global data in different sections otherwise? I'll note the use of this pattern of __init and __init_data is so pervasive throughout the sources, that "don't do that" is a non-starter. But perhaps there's another approach, like storing the constant to a pointer or something, or some other attributes you mention below?

If we rename .init.text to .text.hot. and .init.data to .data.hot., this rule would mean that a regular .text function cannot access .data.hot. data.

If we rename .init.text and .init.data, it will leak memory in the kernel as none of the annotated sections will get cleaned up.

This does not sound right to me.
If the kernel wants to have an optimization barrier on the arguments, it should use some attributes if it does not want to use alwaysinline.

Are there existing attributes we can use on parameters or arguments?

Does langref's section actually provide such guarantees?

Added a note, PTAL.

If we do this, we should create a helper somewhere that tells you if you can forward a constant, given an (Abstract)CallSite. For example, thread local constants can't be forwarded across callbacks.
So it's not only that we (might) have multiple things we need to filter but also multiple locations, Attributor needs this as well.

Yep, that sounds feasible, and with my rewrite is fairly straightforward now that it's decoupled from SCCSolver. For the case you describe, does it also pertain to passing arguments or no? While my current implementation is tailed to the existing argument walk on the CallBase, that could simply be done for such a helper function you describe. Do you have a recommendation on a good location to place such a helper so that SCCSolver and Attributor could both refer to it?

void callee (int* x) { int y = *z; }

Should it be void callee (int* x) { int y = z; } ?

Thanks, fixed.

Later, Dead Argument Elimination (deadargelim) may even change the signature of callee removing dead arguments from its signature

Assuming void callee does not have the static keyword. deadargelim applies on local linkage functions.
LTO may internalize functions if the linker knows callee does not need to be exported. Yes, deadargelim can happen in that case.

Fixed the description to note that callee is static.

https://github.com/ClangBuiltLinux/linux/issues/1301

WARNING: modpost: vmlinux.o(.text+0x68943): Section mismatch in reference from the function __nodes_weight() to the variable .init.data:numa_nodes_parsed

Is the warning about a non-.init.text function accesses .init.data data?

Yes. That's a potential use after free, since .init.data data will be unmapped (and potentially remapped) after initialization, while the non-.init.text (ie. just `.text) function is not.

This feels to me that the kernel is asking too much.

Perhaps; do you have a recommendation on how best to pass references to global data in different sections otherwise? I'll note the use of this pattern of __init and __init_data is so pervasive throughout the sources, that "don't do that" is a non-starter. But perhaps there's another approach, like storing the constant to a pointer or something, or some other attributes you mention below?

If we rename .init.text to .text.hot. and .init.data to .data.hot., this rule would mean that a regular .text function cannot access .data.hot. data.

If we rename .init.text and .init.data, it will leak memory in the kernel as none of the annotated sections will get cleaned up.

This does not sound right to me.
If the kernel wants to have an optimization barrier on the arguments, it should use some attributes if it does not want to use alwaysinline.

Are there existing attributes we can use on parameters or arguments?

Hope @jdoerfert can shed some light on the attribute usage here...

This does not sound right to me.
If the kernel wants to have an optimization barrier on the arguments, it should use some attributes if it does not want to use alwaysinline.

Are there existing attributes we can use on parameters or arguments?

Hope @jdoerfert can shed some light on the attribute usage here...

The only existing one to disable this would be optnone, IIRC, not that it is a solution here. More than once people asked for noipa to disable interprocedural optimization across a call edge. While that might help we'd still need to attach it all over the place and the way I understand the problem we are only concerned with constant prop, not things like inlining.

I guess I would have put something like this into TargetTransformInfo only because helpers like areFunctionArgsABICompatible and areInlineCompatible are there as well:

/// TODO
bool isConstantCompatibleWithCallee(Constant &C, CallBase &CB) {
  Function *Callee = CB.getCalledFunction();
  if (auto *GV = dyn_cast<GlobalValue>(C.stripInBoundsConstantOffsets()))
    return Callee && Callee->getSection() == GV->getSection();
  return true;
}

I can add the AbstractCallSite version after.

fhahn added inline comments.Mar 6 2021, 10:35 AM
llvm/lib/Transforms/Scalar/SCCP.cpp
1277

Shouldn't we check against the current state we have for CAI (using getValueState/ getStructValueState)? Otherwise we might still propagate the global into the caller, e.g. due to conditions or loads. Seeh ttps://godbolt.org/z/qMaMWE

Also, IIUC the real problem is the actual replacement in the function, right? If that's the case, can we just skip replacing uses in invalid functions (e.g. in tryToReplaceWithConstant)? That way, we would still be able to simplify conditions (see test3 in the godbolt) and also propagate the constant through functions without the right section to their callees if they have the right section.

  • check lattice for known constants, add test
llvm/lib/Transforms/Scalar/SCCP.cpp
1277

Shouldn't we check against the current state we have for CAI (using getValueState/ getStructValueState)? Otherwise we might still propagate the global into the caller, e.g. due to conditions or loads. Seeh ttps://godbolt.org/z/qMaMWE

Ah, yes! Thanks for the test case; checking the lattice for known constants actually covers your examples and my existing ones. Updated.

Also, IIUC the real problem is the actual replacement in the function, right? If that's the case, can we just skip replacing uses in invalid functions (e.g. in tryToReplaceWithConstant)?

If you take a look at earlier revisions of this diff, that was the first approach I tried. The issue was that at tryToReplaceWithConstant when we're making replacement decisions, we no longer retain information about the callsite (CallBase) in order to check for mismatch between caller and callee. I used a simple vector to retain the list of arguments here in handleCallArguments, then scanned that list during replacement in tryToReplaceWithConstant. But I found it's much less memory intensive and less work to simply detect such case when building the lattice as is done here.

We probably could go back to something closer to the original implementation; perhaps I could add some machinery to ValueLatticeElement to track such cases, then look for that during replacement?

That way, we would still be able to simplify conditions (see test3 in the godbolt) and also propagate the constant through functions without the right section to their callees if they have the right section.

I see, @test3 now returns true after ipsccp. With my current change, @cmp from your godbolt link looks the same as the input after ipsccp is run. Hmm...

nickdesaulniers planned changes to this revision.Mar 8 2021, 12:12 PM
  • go back to stopping replacement post lattice creation, add @fhahn's test3, fix arbitrary operand being GV+Section
nickdesaulniers added inline comments.Mar 8 2021, 5:29 PM
llvm/lib/Transforms/Scalar/SCCP.cpp
1277

Ok, I still have some cleanup to do tomorrow (make NonReplaceable a private member, add comments for it, clean up function names in tests), but I think this now satisfies the additional cases you identified. (It's remarkable to see how well SCCP replace the return value of @test3, from your godbolt link). But I think this is in a good state. (I also had a bug where the constant with an explicit section could be any member of a GEP, not necessarily the first operand, so I changed the isGVSection helper to DFS all operands).

This permits us to fully propagate the constants in the lattice, then simply avoid problematic replacements.

I might have more changes I'll want to do tomorrow; but if @fhahn is ok with this approach, I'll move what I have to TargetTransformInfo as per @jdoerfert 's recommendation.

nickdesaulniers edited the summary of this revision. (Show Details)Mar 8 2021, 5:30 PM
fhahn added inline comments.Mar 10 2021, 12:03 PM
llvm/lib/Transforms/Scalar/SCCP.cpp
1277

If you take a look at earlier revisions of this diff, that was the first approach I tried. The issue was that at tryToReplaceWithConstant when we're making replacement decisions, we no longer retain information about the callsite (CallBase) in order to check for mismatch between caller and callee. I used a simple vector to retain the list of arguments here in handleCallArguments, then scanned that list during replacement in tryToReplaceWithConstant. But I found it's much less memory intensive and less work to simply detect such case when building the lattice as is done here.

Hm I think then I missed something. I though the problem was introducing uses of constants in functions that have different sections. Not sure why the call site would matter. If the call sites matter, I think it would be good to make the explanation in the langref a bit clearer.