This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] Avoid template for generalized pattern match.
Needs ReviewPublic

Authored by fakepaper56 on May 4 2023, 8:08 AM.

Details

Summary

D141891 introduced an approach to make functions to serve non-vp nodes and vp
nodes. The old patch used template to make functions have different MatchContext
classes. There is a concern that using template for many functions in
DAGCombiner.cpp may expand the binary too much.
The patch replaces template by selecting corresponding MatchContext class in
runtime.

Diff Detail

Event Timeline

fakepaper56 created this revision.May 4 2023, 8:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 8:08 AM
fakepaper56 requested review of this revision.May 4 2023, 8:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 4 2023, 8:08 AM
simoll added a comment.May 8 2023, 1:16 AM

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

fakepaper56 added a comment.EditedMay 8 2023, 5:00 AM

DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I am sorry that the story is stupid. Actually I didn't encounter any problems for the binary size and I just ignored the performance cost of virtual dispatch. After putting the cost of virtual dispatch into consideration, I think the trade-off problem is too big for me and maybe too early to be considered.

fakepaper56 edited the summary of this revision. (Show Details)May 8 2023, 5:01 AM

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I raised it an internal SiFive discussion. I agree a virtual dispatch and a heap allocation are not an improvement. We haven't applied this generic matcher to much code yet and if we started applying it aggressively we may double the size of DAGCombiner.cpp.o. So I thought it was worth thinking about alternative abstractions that didn't duplicate entire functions. I just don't have any good ideas yet.

simoll added a comment.May 9 2023, 7:23 AM

There is a concern that using template for many functions in
DAGCombiner.cpp may make the binary too large.

Could you provide more details on this concern, eg where was it raised? To put this discussion in perspective, looking at vanilla LLVM 14, a debug build of DAGCombiner.cpp.o has 7MB vs 1322MB for libLLVM-14.so (with X86,RISCV). I would add that this approach adds virtual dispatch on the standard DAGCombiner path, which has a performance cost. IMHO, the question is what a good tradeoff would be.

I raised it an internal SiFive discussion. I agree a virtual dispatch and a heap allocation are not an improvement. We haven't applied this generic matcher to much code yet and if we started applying it aggressively we may double the size of DAGCombiner.cpp.o. So I thought it was worth thinking about alternative abstractions that didn't duplicate entire functions. I just don't have any good ideas yet.

You don't need to rewrite the templated code to have the option for virtual dispatch in the future:
If the need arises, we could implement one SuperMatchContext that defers to EmptyContext and VPMatchContext internally (via subclasses or otherwise) and only instantiate the templates for SuperMatchContext once.
(If you go down this route, there is a bunch of cool things you can do, eg combining different matchers or running matcher code with multiple contexts in parallel).