This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
I'm not sure I see why LowerSwitch needs to worry about this optimization. Why doesn't SimplifyCFG or DCE or one of some other control flow optimization pass handle this so LowerSwitch doesn't have to worry about it?
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
83 ↗ | (On Diff #186392) | Does this need to use the default Function analysis usage? |
test/Transforms/LowerSwitch/do-not-handle-impossible-values.ll | ||
6–9 ↗ | (On Diff #186392) | The test checks are pretty thin. I think it would be better to explicitly check for what comparisons are done (if not just generate these) |
Hi Matt,
Thank you for looking into this!
You're right, it is possible to achieve similar results with other passes, and I didn't investigate that beforehand much. Now I did and this is what I found:
- The only passes that can help are exactly the ones that use LVI: JumpThreading and CorrelatedValuePropagation (if followed by SimplifyCFG to clean up)
- We (a downstream GPU target) can't really do JumpThreading: it does too much and shapes CFG in ways that go against our performance targets
- We are (re)considering doing CorrelatedValuePropagation at the moment, but:
3.1) It doesn't do much on top of eliminating dead BBs originated from lowering switches when tested on large suite of real-world shaders (there are some nice selects' eliminations here and there, but very rarely)
3.2) it consumes about 30 times more compile time than this patch using LVI within the LowerSwitch itself costs us on top of TOT LowerSwitch, which changes the extra cost from negligible to requiring consideration
3.3) In fact, it misses some of the cases and it does a poorer job eliminating dead BBs originated from LowerSwitch than this patch even in its current state (and I continue digging, looking at refining the value constraints analysis with computeKnownBits for instance, which can punch holes in the middle of the range with known trailing zeros)
Please let me know if you think this makes sense and if you'd like to see the LVI usage guarded by LowerSwitch's constructor variables & command line options, and if so, what's the acceptable default for it in your opinion.
I will address your other comments a bit later and update the patch, potentially making even more effort figuring out the constraints on the value.
It seems weird to me that doing this somewhere else somehow ends up being more expensive, but this needs a comment somewhere explaining why it should be handled here
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
145 ↗ | (On Diff #186392) | This is strange looking. I'm not sure what it really means to disable the dominator tree |
448–450 ↗ | (On Diff #186392) | These can be merged into one LLVM_DEBUG (and single quotes around \n) |
- CorrelatedValuePropagation processes many kinds of instructions, LowerSwitch processes switches only.
- Even if limited to icmp instructions only, CorrelatedValuePropagation when ran after LowerSwitch will have to process roughly C icmp's per switch, where C is the number of cases in the switch, while LowerSwitch only needs to call LVI once per switch.
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
145 ↗ | (On Diff #186392) | According to http://llvm.org/doxygen/classllvm_1_1LazyValueInfo.html#a5b29ad30fb31c6df2a3cbcefef8ae613 it "Disables use of the DominatorTree within LVI." As far as I can tell, the idea is that if LVI is allowed to use DT, DT needs to be valid at every point. As the LowerSwitch pass creates and deletes BBs as it goes, it will mean updating DT along the way using a DomTreeUpdater. Please see commit 55da8a3a3e0af5afaa51f64c1385b2626c643317 Author: Brian M. Rzycki <brzycki@gmail.com> Date: Fri Feb 16 16:35:17 2018 +0000 [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis Summary: The LazyValueInfo pass caches a copy of the DominatorTree when available. Whenever there are pending DominatorTree updates within JumpThreading's DeferredDominance object we cannot use the cached DT for LVI analysis. This commit adds the new methods enableDT() and disableDT() to LVI. JumpThreading also sets the appropriate usage model before calling LVI analysis methods. Fixes https://bugs.llvm.org/show_bug.cgi?id=36133 Reviewers: sebpop, dberlin, kuhar Reviewed by: sebpop, kuhar Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov Differential Revision: https://reviews.llvm.org/D42717 for details. I did't have time to investigate much neither how profitable it is to have the LVI using DT vs not using DT, nor what the compile time cost of using the updater, but from a quick glance it appeared that LVI only uses DT to do isValidAssumeForContext for assumptions from the AssumptionCache, which is kinda useless at least for us as we rarely if ever have any llvm.assume intrinsic calls on icmp's in the IR. Not to mention, isValidAssumeForContext can handle some of the cases even w/o a DT. |
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
448–450 ↗ | (On Diff #186392) |
Sure, will do.
Why? grep -Eroh "<<\s*['\"]\\\?.['\"]" ./ --include='*.cpp' --include='*.h' --exclude-dir=build | grep -o "['\"]" | sort | uniq -c | sort -n returns 6492 ' 12003 " (the example output of the first regex: << '"' << '[' << '\n' << '\t' << '\n' << '\n' << "\n" << ":" << ":" << "\n" << "\n" << "\n" << "\n" << "\n" << "\n" << "\n" << " " the current folder includes TOT llvm w/o clang or other projects) |
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
83 ↗ | (On Diff #186392) | I'm not sure what should be declared preserved here. I tried removing LowerSwitch from the following pipelines: -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -O0 -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire -march=r600 -mcpu=redwood In no case it saved a re-computation of any of the analyses. Do you have any suggestions? |
- Addressed comments
- Refined the switch's operand constraints with known bits and added corresponding tests (all based on real-world cases)
LGTM
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
83 ↗ | (On Diff #186392) | I know if you don't call the base class analysis usage it causes problems with MachineFunctions, but not sure about IR passes |
448–450 ↗ | (On Diff #186392) | I don't remember where I saw this guideline, but apparently the character << is much cheaper for raw_ostream. I also just find using a string for a single character uglier. |
494 ↗ | (On Diff #186985) | Typo add's |
Thank you!
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
83 ↗ | (On Diff #186392) | Ah, true, I completely misunderstood you. AFAIK, for function passes the base class usage is empty and doesn't do anything. |
448–450 ↗ | (On Diff #186392) | I guess it depends on the tooling used to build it and operating system, And maybe on the version of LLVM as well. I checked on macOS (llvm built by clang) as soon as you mentioned this, just printing a million new lines both ways. The time is exactly the same. As for personal preferences, I guess they are just that. I find it a few keystrokes easier to change between "\n", ".\n", ":\n", " " and whatnot when they are all strings. |
494 ↗ | (On Diff #186985) | Not sure what do you mean. By add's I meant add instructions. I will replace it with that to avoid confusion, thanks! |
lib/Transforms/Utils/LowerSwitch.cpp | ||
---|---|---|
494 ↗ | (On Diff #186985) | It's a plural, not a possessive so it should be adds |