This is an archive of the discontinued LLVM Phabricator instance.

[Inliner] Update !associated metadata during inlining
AbandonedPublic

Authored by phosek on May 19 2020, 12:06 AM.

Details

Reviewers
pcc
eugenis
Summary

When inlining a function that has !associated metadata, update the
metadata to point to caller instead of callee.

Diff Detail

Event Timeline

phosek created this revision.May 19 2020, 12:06 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 19 2020, 12:06 AM

This is a prerequisite for D76802 because InstrProfiling pass runs before inliner which can invalidate the metadata.

eugenis accepted this revision.May 19 2020, 4:06 PM

LGTM

llvm/lib/Transforms/Utils/InlineFunction.cpp
1148

This is technically N^2 if you do this for every function. I guess it's fine as long as you don't see a bump in the compilation time.

This revision is now accepted and ready to land.May 19 2020, 4:06 PM
phosek added inline comments.May 21 2020, 3:01 PM
llvm/lib/Transforms/Utils/InlineFunction.cpp
1148

I don't know if the N^2 behavior is avoidable given the structure of the data where the !associated metadata is attached to GlobalValues and AFAIK there's not way to enumerate all values use a particular MDNode. I've at least added extra check to ensure that we only hit this if the value has metadata attached.

N^2 could be avoided by building a function->associated global_value map to be used in all inlining in the current module, but that seems difficult to plumb through the layers, especially given that the inliner is an SCC pass, not a module pass.
Probably not worth it unless there is data that it slows down compilation time.

phosek planned changes to this revision.Jun 1 2020, 12:21 PM

So after testing this change a bit more, I found a fundamental issue which is the handling of cases when a function is inlined into multiple functions.

Say we have the following IR:

@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0

define internal void @inner_internal() {
  %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 0, i64 1), align 8
  %2 = add i64 %1, 1
  store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 0, i64 1), align 8
  ret void
}

define void @outer1() {
  call void @inner_internal()
  ret void
}

define void @outer2() {
  call void @inner_internal()
  ret void
}

!0 = !{void ()* @inner_internal}

Without this patch, we end up with:

@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0

define void @outer1() {
  %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  %2 = add i64 %1, 1
  store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  ret void
}

define void @outer2() {
  %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  %2 = add i64 %1, 1
  store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  ret void
}

!0 = distinct !{null}

With this patch, we end up with:

@__inner_internal_metadata = private global [1 x i64] zeroinitializer, align 8, !associated !0

define void @outer1() {
  %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  %2 = add i64 %1, 1
  store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  ret void
}

define void @outer2() {
  %1 = load i64, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  %2 = add i64 %1, 1
  store i64 %2, i64* getelementptr inbounds ([1 x i64], [1 x i64]* @__inner_internal_metadata, i64 1, i64 0), align 8
  ret void
}

!0 = !{void ()* @outer2}

Both of these are wrong. In the first case we end up with broken metadata, in the second case the metadata is pointing only to a single function where it should be pointing to both. I'm not sure how to solve this, since there isn't a way to express this with SHF_LINK_ORDER since a single section could only be linked to a single section, but in this case we need a single section (with metadata) to be linked to multiple sections. Any thoughts?

One option we've discussed internally within our team would be to use ELF section groups (not COMDAT which is a special case of section groups) instead of SHF_LINK_ORDER, but those aren't supported in LLVM IR at the moment, and it's also counter to what was discussed in https://groups.google.com/g/generic-abi/c/_CbBM6T6WeM/m/LZnqx1KZAQAJ.

pcc added a subscriber: MaskRay.Jun 1 2020, 12:36 PM

In this case you might consider replacing the global reference with null and implementing my proposal in https://bugs.llvm.org/show_bug.cgi?id=41734#c2 to make the section SHF_LINK_ORDER with a zero-length .init_array section.

phosek added a comment.Jun 1 2020, 1:15 PM
In D80186#2067107, @pcc wrote:

In this case you might consider replacing the global reference with null and implementing my proposal in https://bugs.llvm.org/show_bug.cgi?id=41734#c2 to make the section SHF_LINK_ORDER with a zero-length .init_array section.

Wouldn't that mean that symbols associated with null won't be GCed by the linker?

pcc added a comment.Jun 1 2020, 1:59 PM

Yeah. Thinking about it more, you may consider a different !associated scheme for the __prof*_ globals.

  • __profc_* and __profn_* are associated with a dummy global (to allow the sections to be GC'd).
  • __profd_* is associated with the corresponding __profc_* section.

Now no special rules are required for inlining. __profc_* is kept alive via the references in the functions, __profd_ is kept alive via !associated and __profn_* is kept alive by direct reference.

In D80186#2067298, @pcc wrote:

Yeah. Thinking about it more, you may consider a different !associated scheme for the __prof*_ globals.

  • __profc_* and __profn_* are associated with a dummy global (to allow the sections to be GC'd).
  • __profd_* is associated with the corresponding __profc_* section.

Now no special rules are required for inlining. __profc_* is kept alive via the references in the functions, __profd_ is kept alive via !associated and __profn_* is kept alive by direct reference.

I tried that, but now we get an error ld.lld: error: /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_cnts): sh_link points to discarded section /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_dummy) because the dummy global was discarded which creates a broken sh_link.

pcc added a comment.Jun 2 2020, 10:08 AM
In D80186#2067298, @pcc wrote:

Yeah. Thinking about it more, you may consider a different !associated scheme for the __prof*_ globals.

  • __profc_* and __profn_* are associated with a dummy global (to allow the sections to be GC'd).
  • __profd_* is associated with the corresponding __profc_* section.

Now no special rules are required for inlining. __profc_* is kept alive via the references in the functions, __profd_ is kept alive via !associated and __profn_* is kept alive by direct reference.

I tried that, but now we get an error ld.lld: error: /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_cnts): sh_link points to discarded section /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_dummy) because the dummy global was discarded which creates a broken sh_link.

Hmm. Maybe instead of making the sections sh_link to a dummy global, make them sh_link to themselves. There was a proposal for sh_link=0 to mean what we want here (D72904) but I believe that sh_link=self should have the same semantics.

In D80186#2069110, @pcc wrote:
In D80186#2067298, @pcc wrote:

Yeah. Thinking about it more, you may consider a different !associated scheme for the __prof*_ globals.

  • __profc_* and __profn_* are associated with a dummy global (to allow the sections to be GC'd).
  • __profd_* is associated with the corresponding __profc_* section.

Now no special rules are required for inlining. __profc_* is kept alive via the references in the functions, __profd_ is kept alive via !associated and __profn_* is kept alive by direct reference.

I tried that, but now we get an error ld.lld: error: /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_cnts): sh_link points to discarded section /tmp/instrprof-gc-sections-5f007f.o:(__llvm_prf_dummy) because the dummy global was discarded which creates a broken sh_link.

Hmm. Maybe instead of making the sections sh_link to a dummy global, make them sh_link to themselves. There was a proposal for sh_link=0 to mean what we want here (D72904) but I believe that sh_link=self should have the same semantics.

Thanks, that seems to work!

phosek abandoned this revision.Jun 3 2020, 12:43 PM

This is no longer needed for the profile instrumentation.

davidxl added a subscriber: davidxl.Jun 4 2020, 3:19 PM

I saw you removed the dependency on another patch which was abandoned. Why is it no longer needed?

Also can you collect data of instrumented Clang with IR PGO:

  1. the instrumented binary size change with/without this patch
  2. the raw profile data size change with/without the patch.