The Linux kernel uses section attributes on functions and data that's
used only during initialization to reclaim memory backing such code and
data post initialization.
The preprocessor defines init (for code) and initdata (for global
variables) expand to to:
attribute((section(".init"))) and
attribute((section(".init.data"))) respectively. See also
https://www.kernel.org/doc/html/latest/kernel-hacking/hacking.html?highlight=__initdata#init-exit-initdata.
So a commonly recurring pattern in the kernel is:
initdata int z;
static void callee (int x) { int y = x; }
init void caller (void) { callee(z); }
InterProcedural Sparse Conditional Constant Propagation (IPSCCP) can
turn the above into:
initdata int z;
static void callee (int x) { int y = z; }
init void caller (void) { callee(z); }
Note how callee directly references z directly now, rather than the
parameter x. Later, Dead Argument Elimination (deadargelim) may even
change the signature of callee removing dead arguments from its
signature, avoiding call setup in the caller for those arguments.
Now, consider what happens when callee is *not* inlined into caller.
Upon initialization, the kernel will reclaim z and caller, but not
callee. At best, we can consider this a memory leak. At worst, we've now
left behind a potential gadget for use after free.
With recent changes to enable NPM by default in clang-13
(https://reviews.llvm.org/D95380), the inlining heuristics have been
perturbed, which is leading to many cases in the Linux kernel where
callee was previously being inlined (avoiding all of the above
problems), and now is not.
This patch records the Value's used as parameters when caller and callee
are in explicitly different sections. Then, when IPSCCP goes to perform
transforms first checks if the GlobalValue that's the replacement comes
from an explicit section, and if the Value being replaced was from a
caller/callee section mismatch, and if so bails.
Care is taken to avoid not preventing the optimization for the general
case where section attributes are not used, or at least match. This
results in no change in binary size for the Linux kernel (x86_64
defconfig); there is a tradeoff in .text vs relocations of 24B
(insignificant, 3.64E-7%).
Alternative approaches considered:
- Marking callee __init. We can't generally do this, as otherwise every
helper function wouldn't be callable from non __init code, lest it run
the risk of jumping to unmapped/remapped memory.
- Inheriting __init on callee. While it appears that LLVM is creating a
specialized version of callee, it's technically IPSCCP and DeadArgElim
working together. I don't think adding section attributes to callers is
a general solution.
- Use of attribute((always_inline)). This is tricky because it's
very common in the kernel for callee to be a static inline function
defined in a header. It's infeasible to add such function attribute to
every helper function, hurts compile time, and it's a relatively large
hammer to force the callee to be inlined into *every* caller. I'd argue
that IPSCCP in the case described is dangerous, regardless of inlining.
- Adjusting InlineCost heuristics. We might be able to further discount
the cost for such specific cases described, or try to do somehow when
DeadArgElim has created such a specialized version of callee tightly
bound to caller. With the recent changes to inlining from NPM, I don't
want to perturb the InlineCost heuristic further right now.
Link: https://github.com/ClangBuiltLinux/linux/issues/1302
Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
Shouldn't we check against the current state we have for CAI (using getValueState/ getStructValueState)? Otherwise we might still propagate the global into the caller, e.g. due to conditions or loads. Seeh ttps://godbolt.org/z/qMaMWE
Also, IIUC the real problem is the actual replacement in the function, right? If that's the case, can we just skip replacing uses in invalid functions (e.g. in tryToReplaceWithConstant)? That way, we would still be able to simplify conditions (see test3 in the godbolt) and also propagate the constant through functions without the right section to their callees if they have the right section.