When inlining a function that has !associated metadata, update the
metadata to point to caller instead of callee.
Details
Diff Detail
Event Timeline
This is a prerequisite for D76802 because InstrProfiling pass runs before inliner which can invalidate the metadata.
LGTM
llvm/lib/Transforms/Utils/InlineFunction.cpp | ||
---|---|---|
1148 | This is technically N^2 if you do this for every function. I guess it's fine as long as you don't see a bump in the compilation time. |
llvm/lib/Transforms/Utils/InlineFunction.cpp | ||
---|---|---|
1148 | I don't know if the N^2 behavior is avoidable given the structure of the data where the !associated metadata is attached to GlobalValues and AFAIK there's not way to enumerate all values use a particular MDNode. I've at least added extra check to ensure that we only hit this if the value has metadata attached. |
N^2 could be avoided by building a function->associated global_value map to be used in all inlining in the current module, but that seems difficult to plumb through the layers, especially given that the inliner is an SCC pass, not a module pass.
Probably not worth it unless there is data that it slows down compilation time.
So after testing this change a bit more, I found a fundamental issue which is the handling of cases when a function is inlined into multiple functions.
Say we have the following IR:
@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0 define internal void @inner_internal() { %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 0, i64 1), align 8 %2 = add i64 %1, 1 store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 0, i64 1), align 8 ret void } define void @outer1() { call void @inner_internal() ret void } define void @outer2() { call void @inner_internal() ret void } !0 = !{void ()* @inner_internal}
Without this patch, we end up with:
@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0 define void @outer1() { %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 %2 = add i64 %1, 1 store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 ret void } define void @outer2() { %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 %2 = add i64 %1, 1 store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 ret void } !0 = distinct !{null}
With this patch, we end up with:
@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0 define void @outer1() { %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 %2 = add i64 %1, 1 store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 ret void } define void @outer2() { %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 %2 = add i64 %1, 1 store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8 ret void } !0 = !{void ()* @outer2}
Both of these are wrong. In the first case we end up with broken metadata, in the second case the metadata is pointing only to a single function where it should be pointing to both. I'm not sure how to solve this, since there isn't a way to express this with SHF_LINK_ORDER since a single section could only be linked to a single section, but in this case we need a single section (with metadata) to be linked to multiple sections. Any thoughts?
One option we've discussed internally within our team would be to use ELF section groups (not COMDAT which is a special case of section groups) instead of SHF_LINK_ORDER, but those aren't supported in LLVM IR at the moment, and it's also counter to what was discussed in https://groups.google.com/g/generic-abi/c/_CbBM6T6WeM/m/LZnqx1KZAQAJ.
In this case you might consider replacing the global reference with null and implementing my proposal in https://bugs.llvm.org/show_bug.cgi?id=41734#c2 to make the section SHF_LINK_ORDER with a zero-length .init_array section.
Yeah. Thinking about it more, you may consider a different !associated scheme for the __prof*_ globals.
- __profc_* and __profn_* are associated with a dummy global (to allow the sections to be GC'd).
- __profd_* is associated with the corresponding __profc_* section.
Now no special rules are required for inlining. __profc_* is kept alive via the references in the functions, __profd_ is kept alive via !associated and __profn_* is kept alive by direct reference.
I tried that, but now we get an error ld.lld: error: /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_cnts): sh_link points to discarded section /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_dummy) because the dummy global was discarded which creates a broken sh_link.
Hmm. Maybe instead of making the sections sh_link to a dummy global, make them sh_link to themselves. There was a proposal for sh_link=0 to mean what we want here (D72904) but I believe that sh_link=self should have the same semantics.
I saw you removed the dependency on another patch which was abandoned. Why is it no longer needed?
Also can you collect data of instrumented Clang with IR PGO:
- the instrumented binary size change with/without this patch
- the raw profile data size change with/without the patch.
This is technically N^2 if you do this for every function. I guess it's fine as long as you don't see a bump in the compilation time.