This is an archive of the discontinued LLVM Phabricator instance.

[X86] Split SDISel call lowering out to its own file
ClosedPublic

Authored by rnk on Jun 29 2023, 4:10 PM.

Details

Summary

X86ISelLowering takes too long to compile. Split call lowering out into
its own file so that developers working on call lowering can be more
productive.

Now one can test calling convention changes with <5s rebuilds. The rest
of X86ISelLowering still takes a long time to compile. Previous
investigations shows that lots of static functions are aggressively
inlined into the two major large switch functions, LowerOperation and
PerformDAGCombine. It may be possible to make further compile time
improvements by blocking inlining into those large dispatch functions.

clang-format complains, but I didn't want to reformat because it will
make it harder for git rename detection and blame tools.

Diff Detail

Event Timeline

rnk created this revision.Jun 29 2023, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 4:10 PM
rnk requested review of this revision.Jun 29 2023, 4:10 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2023, 4:10 PM

Not aware any precedent, but I think it's always good for both reducing compile time and code clear.

llvm/lib/Target/X86/CMakeLists.txt
56

Would it be better to use X86ISelLoweringCall? I think it's more clean to start with X86ISelLowering if we have more such split in the future, and it's easy to understand they are extracted from X86ISelLowering.

N

Not aware any precedent, but I think it's always good for both reducing compile time and code clear.

+1

No objections from me, but it'd be useful to take a look at build insights or something to get a better understanding of where the compile time is going inside X86ISelLowering, and not just blame it on #loc - amdgpu and aarch64 tend to be the bigger buildtimesinks for me on multi-target builds.

rnk added a comment.Jul 27 2023, 1:01 PM

Sorry for the delay, I think this requires some more careful rebasing now.

No objections from me, but it'd be useful to take a look at build insights or something to get a better understanding of where the compile time is going inside X86ISelLowering, and not just blame it on #loc - amdgpu and aarch64 tend to be the bigger buildtimesinks for me on multi-target builds.

It's been years since I dug into it, but I think a lot of the compile time cost came down to lots of stuff getting inlined into the two large switches in this file, LowerOperation and PerformDAGCombine. What happens is that all the static functions local to this file get inlined, and those two functions become very large, slowing down optimization.

I previously attempted to refactor this file by extracting all of the vector shuffle combination logic, but it uses so many static helpers it doesn't have a straightforward way to do that.

I do recall compile time issues with AMDGPU and AArch64, but they tended to be from runaway table generated code, not handwritten code as we have in X86ISelLowering.

llvm/lib/Target/X86/CMakeLists.txt
56

That works for me.

rnk updated this revision to Diff 544961.Jul 27 2023, 3:33 PM
  • rebase
  • rename call lowering file
rnk edited the summary of this revision. (Show Details)Jul 27 2023, 3:33 PM
pengfei accepted this revision.Jul 28 2023, 12:04 AM

LGTM.

This revision is now accepted and ready to land.Jul 28 2023, 12:04 AM
RKSimon accepted this revision.Jul 28 2023, 1:28 AM

LGTM - cheers

This revision was landed with ongoing or failed builds.Jul 28 2023, 9:24 AM
This revision was automatically updated to reflect the committed changes.
llvm/lib/Target/X86/X86ISelLowering.h