Page MenuHomePhabricator

[LowerSwitch][AMDGPU] Do not handle impossible values
ClosedPublic

Authored by rtereshin on Feb 11 2019, 9:08 PM.

Details

Summary

This patch adds LazyValueInfo to LowerSwitch to compute the range of the
value being switched over and reduce the size of the tree LowerSwitch
builds to lower a switch.

Diff Detail

Repository
rL LLVM

Event Timeline

rtereshin created this revision.Feb 11 2019, 9:08 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 9:08 PM

I'm not sure I see why LowerSwitch needs to worry about this optimization. Why doesn't SimplifyCFG or DCE or one of some other control flow optimization pass handle this so LowerSwitch doesn't have to worry about it?

lib/Transforms/Utils/LowerSwitch.cpp
83 ↗(On Diff #186392)

Does this need to use the default Function analysis usage?

test/Transforms/LowerSwitch/do-not-handle-impossible-values.ll
6–9 ↗(On Diff #186392)

The test checks are pretty thin. I think it would be better to explicitly check for what comparisons are done (if not just generate these)

I'm not sure I see why LowerSwitch needs to worry about this optimization. Why doesn't SimplifyCFG or DCE or one of some other control flow optimization pass handle this so LowerSwitch doesn't have to worry about it?

Hi Matt,

Thank you for looking into this!

You're right, it is possible to achieve similar results with other passes, and I didn't investigate that beforehand much. Now I did and this is what I found:

  1. The only passes that can help are exactly the ones that use LVI: JumpThreading and CorrelatedValuePropagation (if followed by SimplifyCFG to clean up)
  2. We (a downstream GPU target) can't really do JumpThreading: it does too much and shapes CFG in ways that go against our performance targets
  3. We are (re)considering doing CorrelatedValuePropagation at the moment, but:

3.1) It doesn't do much on top of eliminating dead BBs originated from lowering switches when tested on large suite of real-world shaders (there are some nice selects' eliminations here and there, but very rarely)
3.2) it consumes about 30 times more compile time than this patch using LVI within the LowerSwitch itself costs us on top of TOT LowerSwitch, which changes the extra cost from negligible to requiring consideration
3.3) In fact, it misses some of the cases and it does a poorer job eliminating dead BBs originated from LowerSwitch than this patch even in its current state (and I continue digging, looking at refining the value constraints analysis with computeKnownBits for instance, which can punch holes in the middle of the range with known trailing zeros)

Please let me know if you think this makes sense and if you'd like to see the LVI usage guarded by LowerSwitch's constructor variables & command line options, and if so, what's the acceptable default for it in your opinion.

I will address your other comments a bit later and update the patch, potentially making even more effort figuring out the constraints on the value.

It seems weird to me that doing this somewhere else somehow ends up being more expensive, but this needs a comment somewhere explaining why it should be handled here

lib/Transforms/Utils/LowerSwitch.cpp
145 ↗(On Diff #186392)

This is strange looking. I'm not sure what it really means to disable the dominator tree

448–450 ↗(On Diff #186392)

These can be merged into one LLVM_DEBUG (and single quotes around \n)

It seems weird to me that doing this somewhere else somehow ends up being more expensive, but this needs a comment somewhere explaining why it should be handled here

  1. CorrelatedValuePropagation processes many kinds of instructions, LowerSwitch processes switches only.
  2. Even if limited to icmp instructions only, CorrelatedValuePropagation when ran after LowerSwitch will have to process roughly C icmp's per switch, where C is the number of cases in the switch, while LowerSwitch only needs to call LVI once per switch.
rtereshin marked an inline comment as done.Feb 14 2019, 5:55 PM
rtereshin added inline comments.
lib/Transforms/Utils/LowerSwitch.cpp
145 ↗(On Diff #186392)

According to http://llvm.org/doxygen/classllvm_1_1LazyValueInfo.html#a5b29ad30fb31c6df2a3cbcefef8ae613 it "Disables use of the DominatorTree within LVI." As far as I can tell, the idea is that if LVI is allowed to use DT, DT needs to be valid at every point. As the LowerSwitch pass creates and deletes BBs as it goes, it will mean updating DT along the way using a DomTreeUpdater. Please see

commit 55da8a3a3e0af5afaa51f64c1385b2626c643317
Author: Brian M. Rzycki <brzycki@gmail.com>
Date:   Fri Feb 16 16:35:17 2018 +0000

    [JumpThreading] PR36133 enable/disable DominatorTree for LVI analysis

    Summary:
    The LazyValueInfo pass caches a copy of the DominatorTree when available.
    Whenever there are pending DominatorTree updates within JumpThreading's
    DeferredDominance object we cannot use the cached DT for LVI analysis.
    This commit adds the new methods enableDT() and disableDT() to LVI.
    JumpThreading also sets the appropriate usage model before calling LVI
    analysis methods.

    Fixes https://bugs.llvm.org/show_bug.cgi?id=36133

    Reviewers: sebpop, dberlin, kuhar

    Reviewed by: sebpop, kuhar

    Subscribers: uabelho, llvm-commits, aprantl, hiraditya, a.elovikov

    Differential Revision: https://reviews.llvm.org/D42717

for details.

I did't have time to investigate much neither how profitable it is to have the LVI using DT vs not using DT, nor what the compile time cost of using the updater, but from a quick glance it appeared that LVI only uses DT to do isValidAssumeForContext for assumptions from the AssumptionCache, which is kinda useless at least for us as we rarely if ever have any llvm.assume intrinsic calls on icmp's in the IR. Not to mention, isValidAssumeForContext can handle some of the cases even w/o a DT.

rtereshin marked an inline comment as done.Feb 14 2019, 6:20 PM
rtereshin added inline comments.
lib/Transforms/Utils/LowerSwitch.cpp
448–450 ↗(On Diff #186392)

These can be merged into one LLVM_DEBUG

Sure, will do.

and single quotes around \n

Why?

grep -Eroh "<<\s*['\"]\\\?.['\"]" ./ --include='*.cpp' --include='*.h' --exclude-dir=build | grep -o "['\"]" | sort | uniq -c | sort -n

returns

6492 '
12003 "

(the example output of the first regex:

<< '"'
<< '['
<< '\n'
<< '\t'
<< '\n'
<< '\n'
<< "\n"
<< ":"
<< ":"
<< "\n"
<< "\n"
<< "\n"
<< "\n"
<< "\n"
<< "\n"
<< "\n"
<< " "

the current folder includes TOT llvm w/o clang or other projects)

rtereshin marked an inline comment as done.Feb 14 2019, 11:07 PM
rtereshin added inline comments.
lib/Transforms/Utils/LowerSwitch.cpp
83 ↗(On Diff #186392)

I'm not sure what should be declared preserved here.

I tried removing LowerSwitch from the following pipelines:

-mtriple=amdgcn-amd-amdhsa -mcpu=gfx900
-O0 -mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire
-mtriple=amdgcn-mesa-mesa3d -mcpu=bonaire
-march=r600 -mcpu=redwood

In no case it saved a re-computation of any of the analyses.

Do you have any suggestions?

rtereshin updated this revision to Diff 186985.Feb 15 2019, 3:08 AM
  1. Addressed comments
  2. Refined the switch's operand constraints with known bits and added corresponding tests (all based on real-world cases)

Ping!

Is this good to go in?

Thanks,
Roman

arsenm accepted this revision.Feb 22 2019, 5:40 AM

LGTM

lib/Transforms/Utils/LowerSwitch.cpp
494 ↗(On Diff #186985)

Typo add's

83 ↗(On Diff #186392)

I know if you don't call the base class analysis usage it causes problems with MachineFunctions, but not sure about IR passes

448–450 ↗(On Diff #186392)

I don't remember where I saw this guideline, but apparently the character << is much cheaper for raw_ostream. I also just find using a string for a single character uglier.

This revision is now accepted and ready to land.Feb 22 2019, 5:40 AM
rtereshin marked 3 inline comments as done.Feb 22 2019, 6:19 AM

Thank you!

lib/Transforms/Utils/LowerSwitch.cpp
494 ↗(On Diff #186985)

Not sure what do you mean. By add's I meant add instructions. I will replace it with that to avoid confusion, thanks!

83 ↗(On Diff #186392)

Ah, true, I completely misunderstood you. AFAIK, for function passes the base class usage is empty and doesn't do anything.

448–450 ↗(On Diff #186392)

I guess it depends on the tooling used to build it and operating system, And maybe on the version of LLVM as well.

I checked on macOS (llvm built by clang) as soon as you mentioned this, just printing a million new lines both ways. The time is exactly the same.

As for personal preferences, I guess they are just that. I find it a few keystrokes easier to change between "\n", ".\n", ":\n", " " and whatnot when they are all strings.

This revision was automatically updated to reflect the committed changes.
arsenm added inline comments.Feb 22 2019, 6:46 AM
lib/Transforms/Utils/LowerSwitch.cpp
494 ↗(On Diff #186985)

It's a plural, not a possessive so it should be adds