This is an archive of the discontinued LLVM Phabricator instance.

AMDGPU: Don't run redundant GlobalDCE
Needs ReviewPublic

Authored by arsenm on Dec 10 2019, 6:04 AM.

Details

Reviewers
rampitec
Summary

The internalize pass is added early, and
GlobalOpt / GlobalDCE are already run later.

Diff Detail

Event Timeline

arsenm created this revision.Dec 10 2019, 6:04 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 10 2019, 6:04 AM

The idea to run it was to speedup compilation. After internalize we have an opportunity to remove some functions and do not spend time optimizing them. As far as I understand GlobalDCE is really cheap, unlike passes running before its next invocation.

Did you make any compile time measurements to support it?

The idea to run it was to speedup compilation. After internalize we have an opportunity to remove some functions and do not spend time optimizing them. As far as I understand GlobalDCE is really cheap, unlike passes running before its next invocation.

Did you make any compile time measurements to support it?

I found this patch lying around from 2017, and don't remember why I was looking at this. There aren't many passes run between the internalize here and GlobalOpt:

Internalize Global Symbols
Dead Global Elimination
Interprocedural Sparse Conditional Constant Propagation
  FunctionPass Manager
    Dominator Tree Construction
Called Value Propagation
Deduce and propagate attributes
Global Variable Optimizer

The idea to run it was to speedup compilation. After internalize we have an opportunity to remove some functions and do not spend time optimizing them. As far as I understand GlobalDCE is really cheap, unlike passes running before its next invocation.

Did you make any compile time measurements to support it?

I found this patch lying around from 2017, and don't remember why I was looking at this. There aren't many passes run between the internalize here and GlobalOpt:

Internalize Global Symbols
Dead Global Elimination
Interprocedural Sparse Conditional Constant Propagation
  FunctionPass Manager
    Dominator Tree Construction
Called Value Propagation
Deduce and propagate attributes
Global Variable Optimizer

AFAIR initially GlobalDCE was added to skip unused library functions after link. Anyway, do you think GlobalDCE adds any visible slowdown, which is more than DT construction?