If the caller can preserve the OBB, we can avoid recomputing the order
for each getDependency call.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 29588 Build 29587: arc lint + arc unit
Event Timeline
I was working on restarting the in-instruction order tracking discussion, but the pathological test case I came up with didn't benefit from it:
const std::vector<unsigned> &lookup(std::string k) { static std::map<std::string, std::vector<unsigned>> the_map = { {"s", {1, 2, 3}}, {"s", {1, 2, 3}}, ... repeat 200x, or arbitrarily long }; return the_map[k]; }
This causes clang to generate a very large BB that DSE spends a lot of time on. However, invalidating the order numbers only on removal wasn't enough to drastically improve performance, so I lost the motivation to drive it through. The ordering helped significantly with my less simplified test case (std::map<unsigned, std::vector<StringRef>>, see rC345329 which removed it), but I wasn't happy with it.
llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h | ||
---|---|---|
456–457 | This function only has one real call site, which is in getPointerDependencyFrom. Can you make these parameters non-optional? | |
llvm/lib/Analysis/MemoryDependenceAnalysis.cpp | ||
446–450 | I've seen this pattern across LLVM, and I've never particularly liked it. I don't think it's as easy to optimize as the author probably thought it was. I think this would be simpler: unsigned DefaultLimit = BlockScanLimit; if (!Limit) Limit = &DefaultLimit; ... |
Yes, it looks like there is another compile time problem in DSE, which is independent of the order query issue. I have an input file with a huge number of stores, and the majority of the time is spent getting dependency info (but not in OrderedBB). It is probably the same issue you are seeing. Did you create a PR for it already?
@rnk is there any chance you still have the full reproducer? I did not manage to reproduce the long time spent in DSE with the snippet above and I cannot share the reproducer I have unfortunately :(
Try this bitcode I attached in the tracker:
https://bugs.llvm.org/attachment.cgi?id=21689
This is the time report I see for it:
$ opt -time-passes -O2 longbbinit.bc -o NUL ===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 16.4531 seconds (16.4770 wall clock) ---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 9.8281 ( 64.8%) 1.2188 ( 95.1%) 11.0469 ( 67.1%) 11.0376 ( 67.0%) Dead Store Elimination 1.7344 ( 11.4%) 0.0000 ( 0.0%) 1.7344 ( 10.5%) 1.6974 ( 10.3%) Global Value Numbering 0.6563 ( 4.3%) 0.0156 ( 1.2%) 0.6719 ( 4.1%) 0.6795 ( 4.1%) Function Integration/Inlining 0.3750 ( 2.5%) 0.0000 ( 0.0%) 0.3750 ( 2.3%) 0.3846 ( 2.3%) Value Propagation 0.3750 ( 2.5%) 0.0000 ( 0.0%) 0.3750 ( 2.3%) 0.3743 ( 2.3%) Value Propagation #2 0.2344 ( 1.5%) 0.0156 ( 1.2%) 0.2500 ( 1.5%) 0.2515 ( 1.5%) Bitcode Writer
I'm not sure it's the exact issue you're looking at here, though.
This function only has one real call site, which is in getPointerDependencyFrom. Can you make these parameters non-optional?