This is an archive of the discontinued LLVM Phabricator instance.

[Inline] Attempt to delete any discardable if unused functions
ClosedPublic

Authored by aeubanks on Dec 10 2021, 11:40 AM.

Diff Detail

Event Timeline

aeubanks created this revision.Dec 10 2021, 11:40 AM
aeubanks requested review of this revision.Dec 10 2021, 11:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2021, 11:40 AM
nikic requested changes to this revision.Dec 10 2021, 12:13 PM
nikic added a subscriber: nikic.

I believe that for non-local linkage you also need to check for comdats. You can only drop the comdat as a whole, not an individual function.

This revision now requires changes to proceed.Dec 10 2021, 12:13 PM
aeubanks updated this revision to Diff 394040.Dec 13 2021, 1:35 PM

check for comdats
this causes compile time regressions due to calling LazyCallGraph::removeDeadFunction() more, seeing if that can be optimized

nikic added a comment.Dec 14 2021, 5:18 AM

this causes compile time regressions due to calling LazyCallGraph::removeDeadFunction() more, seeing if that can be optimized

That seems surprising, what's so expensive about that method?

this causes compile time regressions due to calling LazyCallGraph::removeDeadFunction() more, seeing if that can be optimized

That seems surprising, what's so expensive about that method?

Weird, I remeasured and it didn't show up as significant anymore.

Now filterDeadComdatFunctions() is extremely expensive, which makes sense since we're iterating over every function/global in the module on every inliner run on an SCC. Not sure there's a good way around this unless we start keeping track of comdat users within the comdat.
Shame since this saves a lot of memory of template heavy code: https://llvm-compile-time-tracker.com/compare.php?from=d66323c1ef28104ccf7d72e05954fc1241845a33&to=c0a6e811b81deee725a2eaf37662d7181e11e704&stat=max-rss. Which I actually don't really understand. I would have thought that we're already clearing function analyses due to the recently added eager invalidation flag, and I wouldn't think that the IR itself takes up that much memory compared to analyses, but maybe I'm wrong.

nikic added a comment.Dec 16 2021, 1:57 AM

this causes compile time regressions due to calling LazyCallGraph::removeDeadFunction() more, seeing if that can be optimized

That seems surprising, what's so expensive about that method?

Weird, I remeasured and it didn't show up as significant anymore.

Now filterDeadComdatFunctions() is extremely expensive, which makes sense since we're iterating over every function/global in the module on every inliner run on an SCC. Not sure there's a good way around this unless we start keeping track of comdat users within the comdat.

Keeping track of users makes sense to me, and doesn't seem overly complicated. I've put up a prototype at D115864.

Shame since this saves a lot of memory of template heavy code: https://llvm-compile-time-tracker.com/compare.php?from=d66323c1ef28104ccf7d72e05954fc1241845a33&to=c0a6e811b81deee725a2eaf37662d7181e11e704&stat=max-rss. Which I actually don't really understand. I would have thought that we're already clearing function analyses due to the recently added eager invalidation flag, and I wouldn't think that the IR itself takes up that much memory compared to analyses, but maybe I'm wrong.

I believe we'd keep analyses queried on the callee (which should be BFI -> LI -> DT), as we'll only invalidate analysis on the caller after inlining is done. Which does seem like the right thing to do, otherwise we'd recompute those every time a callee is considered. But I also don't think these analyses are particularly large either.

nikic added a comment.Dec 16 2021, 3:30 AM

this causes compile time regressions due to calling LazyCallGraph::removeDeadFunction() more, seeing if that can be optimized

That seems surprising, what's so expensive about that method?

Weird, I remeasured and it didn't show up as significant anymore.

Now filterDeadComdatFunctions() is extremely expensive, which makes sense since we're iterating over every function/global in the module on every inliner run on an SCC. Not sure there's a good way around this unless we start keeping track of comdat users within the comdat.

Keeping track of users makes sense to me, and doesn't seem overly complicated. I've put up a prototype at D115864.

Though testing that patch with this one the compile-time impact seems to be about the same, so either I did something wrong or the problem isn't with the comdat scan.

nikic accepted this revision.Jan 7 2022, 12:22 AM

LGTM

llvm/test/Transforms/LoopUnroll/2011-08-09-PhiUpdate.ll
2 ↗(On Diff #394040)
This revision is now accepted and ready to land.Jan 7 2022, 12:22 AM
aeubanks edited the summary of this revision. (Show Details)Jan 7 2022, 11:05 AM
This revision was landed with ongoing or failed builds.Jan 7 2022, 11:05 AM
This revision was automatically updated to reflect the committed changes.

Hm, it looks like this breaks the test-suite build under ReleaseLTO-g: https://llvm-compile-time-tracker.com/show_error.php?commit=bccffe3f8d5dd4dda884c9ac1f93e51772519cad

taking a look

aeubanks reopened this revision.Jan 7 2022, 4:50 PM

reduced repro

$ opt -passes=inline -o /dev/null -S a.ll
$ cat /tmp/a.ll
%a = type { i8*, i8* }

$f3 = comdat any

define linkonce_odr void @f1() {
  %call5 = call void (i64, %a*)* @f2(void (i64, %a*)* @f3)
  ret void
}

define linkonce_odr void (i64, %a*)* @f2(void (i64, %a*)* %__f) {
  call void @llvm.dbg.value(metadata void (i64, %a*)* %__f, metadata !2, metadata !DIExpression()), !dbg !10
  call void %__f(i64 0, %a* null)
  ret void (i64, %a*)* null
}

define linkonce_odr void @f3(i64 %t.coerce0, %a* %t.coerce1) comdat {
  ret void
}

; Function Attrs: nofree nosync nounwind readnone speculatable willreturn
declare void @llvm.dbg.value(metadata, metadata, metadata) #0

attributes #0 = { nofree nosync nounwind readnone speculatable willreturn }

!llvm.module.flags = !{!0, !1}

!0 = !{i32 7, !"Dwarf Version", i32 4}
!1 = !{i32 2, !"Debug Info Version", i32 3}
!2 = !DILocalVariable(name: "__f", arg: 3, scope: !3, file: !4, line: 3814, type: !9)
!3 = distinct !DISubprogram(scope: !5, file: !4, line: 3814, type: !6, scopeLine: 3815, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !8, templateParams: !7, retainedNodes: !7)
!4 = !DIFile(filename: "a", directory: "")
!5 = !DINamespace(name: "std", scope: null)
!6 = !DISubroutineType(types: !7)
!7 = !{}
!8 = distinct !DICompileUnit(language: DW_LANG_C_plus_plus_14, file: !4, isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !7, retainedTypes: !7, globals: !7, imports: !7, splitDebugInlining: false, nameTableKind: None)
!9 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !6, size: 64)
!10 = !DILocation(line: 0, scope: !3)

the crash is because use_empty() doesn't account for metadata uses and eventually we end up with

define linkonce_odr void @f1() {                                                                                                                                                                   
  call void @llvm.dbg.value(metadata void (i64, %a*)* @f3, metadata !2, metadata !DIExpression()), !dbg !10                                                                                        
  ret void                                                                                                                                                                                         
}                                                                                                                                                                                                  
                                                                                                                                                                                                   
declare linkonce_odr void @f3(i64 %t.coerce0, %a* %t.coerce1) comdat

and we delete @f3() even with the metadata reference

This revision is now accepted and ready to land.Jan 7 2022, 4:50 PM
aeubanks updated this revision to Diff 398275.Jan 7 2022, 4:52 PM

fix crash

nikic added a subscriber: mtrofin.Jan 8 2022, 12:52 AM
nikic added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
952

I don't think this is the right way to fix it. The whole premise of metadata users is that they automatically get dropped when the value is destroyed.

I think the problem here is that for the comdat case, we call recordInlining() rather than recordInliningWithCalleeDeleted(), and apparently for some reason the InlineAdvisor is actually responsible for destroying functions, so the function ends up only getting removed from the module, but not erased, and metadata users never get dropped. This means that other cleanup also won't happen, not just the incorrect metadata uses.

I think we either need to restrict this to the case where the comdat is trivial (only has one entry) and we can make the decision immediately, or we need to adjust the advisor API to separate out "recordInlining" from "recordCalleeDeleted". Not sure how that would interact with the ML inline advisor though, cc @mtrofin.

I'd recommend to go for the trivial comdat check for now, because I think that's the main interesting case anyway.

mtrofin added inline comments.Jan 9 2022, 6:30 PM
llvm/lib/Transforms/IPO/Inliner.cpp
952

I don't think this is the right way to fix it. The whole premise of metadata users is that they automatically get dropped when the value is destroyed.

I think the problem here is that for the comdat case, we call recordInlining() rather than recordInliningWithCalleeDeleted(), and apparently for some reason the InlineAdvisor is actually responsible for destroying functions, so the function ends up only getting removed from the module, but not erased, and metadata users never get dropped. This means that other cleanup also won't happen, not just the incorrect metadata uses.

I think we either need to restrict this to the case where the comdat is trivial (only has one entry) and we can make the decision immediately, or we need to adjust the advisor API to separate out "recordInlining" from "recordCalleeDeleted". Not sure how that would interact with the ML inline advisor though, cc @mtrofin.

I'd recommend to go for the trivial comdat check for now, because I think that's the main interesting case anyway.

952

The reason the advisor took the responsibility of actually deleting (i.e. delete - deleting) dead functions was to allow it to use function pointers to track some module-wide state. So then all other cleanup should have been done here, but the actual object is deleted at the end. We missed the cleanup in GlobalObject.

I think the better approach is to use Node (LazySCC::Node). I think I should do that change, remove the deleting responsibility from the Advisor, then we can delete functions at line 1048 down, and then, like @nikic observed, we should call here recordInliningWithCaleeDeleted, because we want to track if the effect of an inlining simplifies the call graph. If that sounds reasonable, I'll have the patch up shortly.

BTW - so if not all linkonce_odr functions can be deleted, then what happens, they'd still be emitted, but then, at link time, they'd be dropped?

mtrofin added inline comments.Jan 10 2022, 11:23 AM
llvm/lib/Transforms/IPO/Inliner.cpp
952

D116964 for removing the deleting responsibility from the Advisor.

aeubanks updated this revision to Diff 399190.Jan 11 2022, 8:35 PM

rebase past D116964
drop metadata use check

nikic accepted this revision.Jan 12 2022, 7:40 AM

LGTM

mtrofin added inline comments.Jan 12 2022, 8:11 AM
llvm/lib/Transforms/IPO/Inliner.cpp
966

can you make Advice->recordInliningWithCalleeDeleted happen in this case, too?

nikic added inline comments.Jan 12 2022, 8:29 AM
llvm/lib/Transforms/IPO/Inliner.cpp
966

I'm not sure that would be the right modelling. I believe the reasoning here is that even if we drop a linkonce_odr function in this module, there may be many other modules that also define the same function. It will only be really deleted if it is deleted from *all* modules. Otherwise dropping it in just one module has no effect after linking. I think this is why the inlining cost model does not apply the last call to local to linkonce_odr functions. And I would expect that the advisor would follow similar reasoning.

mtrofin accepted this revision.Jan 12 2022, 8:35 AM
mtrofin added inline comments.
llvm/lib/Transforms/IPO/Inliner.cpp
966

It would hint to the ML Advisor (during training) that it did "a good thing" because, locally, at this module level, we did the best we could have for such a function (i.e. deleted it). But we can try this first ourselves and see if it significantly helps or not.

This revision was landed with ongoing or failed builds.Jan 12 2022, 8:42 AM
This revision was automatically updated to reflect the committed changes.

This seems to cause a reproducible infinite loop inside the inliner on some inputs. I haven't narrowed down a repro case yet, but I'm going to revert this in the meantime.

aeubanks reopened this revision.Jan 13 2022, 9:29 AM

I looked at the repro and I think that this is slowing down specific compiles a lot rather than hanging indefinitely. Taking a closer look.

This revision is now accepted and ready to land.Jan 13 2022, 9:29 AM

a flame graph shows that removeDeadConstantUsers() is taking up almost half of the compile time in the repro

nikic added a comment.Jan 13 2022, 9:41 AM

a flame graph shows that removeDeadConstantUsers() is taking up almost half of the compile time in the repro

Switching to hasOneLiveUse() instead should avoid that, as it will short-circuit.