This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/Analysis/
-
llvm/
-
Analysis/
2/4
ValueLattice.h
-
test/Transforms/SCCP/
-
Transforms/
-
SCCP/
3/7
constant-range-struct.ll
-
widening.ll

Differential D78391

[ValueLattice] Allow two range extensions.
AbandonedPublic

Authored by fhahn on Apr 17 2020, 11:15 AM.

Download Raw Diff

Details

Reviewers

nikic
efriedma
davide
mssimpso

Summary

This patch allows 2 range extensions instead of 1, which leads to some
improvements, as seen in the test cases. On the test suite, it leads to
the following improvements:

test-suite...rks/FreeBench/mason/mason.test 3.00 4.00 33.3%
test-suite...marks/SciMark2-C/scimark2.test 32.00 41.00 28.1%
test-suite...ngs-C/assembler/assembler.test 9.00 11.00 22.2%
test-suite...CFP2000/177.mesa/177.mesa.test 204.00 237.00 16.2%
test-suite...langs-C/football/football.test 96.00 108.00 12.5%
test-suite...oxyApps-C++/miniFE/miniFE.test 37.00 41.00 10.8%
test-suite...rks/FreeBench/pifft/pifft.test 28.00 30.00 7.1%
test-suite...CFP2006/433.milc/433.milc.test 226.00 242.00 7.1%
test-suite...ks/Prolangs-C/agrep/agrep.test 57.00 61.00 7.0%
test-suite.../CINT2000/175.vpr/175.vpr.test 93.00 99.00 6.5%
test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 74.00 5.7%
test-suite...tions/lambda-0.1.3/lambda.test 19.00 20.00 5.3%
test-suite...urce/Applications/lua/lua.test 289.00 298.00 3.1%
test-suite...ications/JM/ldecod/ldecod.test 163.00 168.00 3.1%
test-suite...TimberWolfMC/timberwolfmc.test 68.00 70.00 2.9%
test-suite...lications/ClamAV/clamscan.test 910.00 933.00 2.5%
test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9%
test-suite...0.perlbench/400.perlbench.test 1619.00 1648.00 1.8%
test-suite.../CINT2000/176.gcc/176.gcc.test 1123.00 1141.00 1.6%
test-suite...T2006/445.gobmk/445.gobmk.test 1631.00 1656.00 1.5%
test-suite...ce/Benchmarks/PAQ8p/paq8p.test 90.00 91.00 1.1%
test-suite.../CINT2006/403.gcc/403.gcc.test 3847.00 3884.00 1.0%
test-suite...006/453.povray/453.povray.test 1773.00 1789.00 0.9%
test-suite...T2006/456.hmmer/456.hmmer.test 124.00 125.00 0.8%
test-suite...ocBench/espresso/espresso.test 147.00 148.00 0.7%
test-suite...nsumer-lame/consumer-lame.test 206.00 207.00 0.5%
test-suite...marks/7zip/7zip-benchmark.test 4875.00 4897.00 0.5%
test-suite...000/186.crafty/186.crafty.test 287.00 288.00 0.3%
test-suite.../Applications/SPASS/SPASS.test 1982.00 1987.00 0.3%
test-suite...nal/skidmarks10/skidmarks.test 428.00 429.00 0.2%
test-suite...3.xalancbmk/483.xalancbmk.test 3912.00 3919.00 0.2%
test-suite...006/447.dealII/447.dealII.test 1057.00 1058.00 0.1%
test-suite...:: External/Povray/povray.test 1538.00 1539.00 0.1%
test-suite...6/464.h264ref/464.h264ref.test 6757.00 6761.00 0.1%

Increasing the number more seems to have relatively little impact, .e.g
for 8 vs instead of 2:

test-suite...ce/Benchmarks/PAQ8p/paq8p.test 91.00 94.00 3.3%
test-suite.../CINT2000/176.gcc/176.gcc.test 1141.00 1164.00 2.0%
test-suite...telecomm-gsm/telecomm-gsm.test 166.00 169.00 1.8%
test-suite...ediabench/gsm/toast/toast.test 166.00 169.00 1.8%
test-suite...000/197.parser/197.parser.test 61.00 62.00 1.6%
test-suite...chmarks/MallocBench/gs/gs.test 341.00 346.00 1.5%
test-suite...lications/ClamAV/clamscan.test 933.00 938.00 0.5%
test-suite...nsumer-lame/consumer-lame.test 207.00 208.00 0.5%
test-suite...3.xalancbmk/483.xalancbmk.test 3919.00 3928.00 0.2%
test-suite...006/450.soplex/450.soplex.test 441.00 442.00 0.2%
test-suite...marks/7zip/7zip-benchmark.test 4897.00 4905.00 0.2%
test-suite...-typeset/consumer-typeset.test 3180.00 3185.00 0.2%
test-suite...006/447.dealII/447.dealII.test 1058.00 1059.00 0.1%
test-suite.../CINT2006/403.gcc/403.gcc.test 3884.00 3886.00 0.1%
test-suite...006/453.povray/453.povray.test 1789.00 1789.00 0.0%

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

fhahn created this revision.Apr 17 2020, 11:15 AM

Herald added a project: Restricted Project. · View Herald TranscriptApr 17 2020, 11:15 AM

Harbormaster failed remote builds in B53756: Diff 258379!Apr 17 2020, 11:21 AM

grandinj added a subscriber: grandinj.Apr 19 2020, 12:12 AM

grandinj added inline comments.

llvm/include/llvm/Analysis/ValueLattice.h
350	s/more than twice/extended more than twice

Add missing 'extended' in comment.

Harbormaster completed remote builds in B53991: Diff 258804.Apr 20 2020, 11:54 AM

nikic added inline comments.Apr 21 2020, 11:42 AM

llvm/include/llvm/Analysis/ValueLattice.h
353	Add `MaxRangeExtensions = 2` static const?
llvm/test/Transforms/SCCP/constant-range-struct.ll
106	Disclaimer: First time I'm looking at SCCP in LLVM. I feel like we really should be handling this already, without support for multiple widenings (and also, we should handle this for cases with more phi arguments). What happens in `@struct2` is that we first visit the entry block, then the `false` block, then the `exit` block (computing everything there), then the `true` block, then `true -> exit` becomes feasible and we update the phi constant ranges, and then merge in new values for all instructions in `exit`, going to overdefined (without widening). If we had instead visited `entry`, `false`, `true`, `exit`, then not only would we have saved on duplicate computations, we would have also avoided the need to perform widening. I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops).

nikic added inline comments.Apr 21 2020, 11:59 AM

llvm/test/Transforms/SCCP/constant-range-struct.ll
106	Just as a quick experiment, visiting BBs in BFS instead of DFS: https://gist.github.com/nikic/85824a0d14225cd1af5a8b0e25982c0e Some wins, some losses. What we really want is to visit the BBs in RPO order.

fhahn marked an inline comment as done.Apr 21 2020, 1:04 PM

fhahn added inline comments.

llvm/test/Transforms/SCCP/constant-range-struct.ll
106	I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops). In some cases it boils down to iteration order, yes. What makes things a bit more tricky is that the order of blocks also depends on the current value of the branch conditions (i.e. if we can prove a condition is true/false we don't need to visit one of the predecessors), which in turn depends on the iteration order. While only constants were supported, it did not matter too much, but becomes more relevant with constant ranges. In the example above, we have to execute both true and false straight away, as the condition is overdefined. In other scenarios a different iteration order won't help unfortunately. Granted, RPO would probably be better in general, but more expensive and we probably would have to keep the worklist in RPO order as well. It might be worth a try though (and not matter too much in terms of compile-time). I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap)

nikic added inline comments.Apr 21 2020, 1:38 PM

llvm/test/Transforms/SCCP/constant-range-struct.ll
106	I played around with this a bit more. I think making use of RPO proper may be non-trivial for the interprocedural case. I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO. I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap) I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis. Ideally SCCP as an algorithm should not depend on iteration order (outside of performance concerns), but I can see how this is hard to realize once ranges are involved.

fhahn marked an inline comment as done.Apr 22 2020, 1:56 PM

fhahn added inline comments.

llvm/test/Transforms/SCCP/constant-range-struct.ll
106	I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO. Interesting, thanks for sharing! I gave it a try and it seems to give some nice improvements in some cases, but there are a few notable regressions below (I'm running -O3 LTO for MultiSource & SPEC) It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there. IPNumInstRemoved regressions test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 31.00 -55.7% test-suite...CFP2000/177.mesa/177.mesa.test 202.00 183.00 -9.4% NumInstRemoved test-suite...CFP2000/177.mesa/177.mesa.test 1862.00 839.00 -54.9% test-suite...marks/7zip/7zip-benchmark.test 1445.00 1412.00 -2.3% test-suite.../CINT2000/176.gcc/176.gcc.test 3092.00 3060.00 -1.0% I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis. I collected numbers with your diff as baseline, followed by D78376, followed by this patch. The changes between D78376 and this patch are as follows and look like they are in the same ball-park, presumably covering mostly different/additional cases. (for the mpeg2decode.test case it mostly restores the regressions with the baseline): IPNumInstRemoved D78376. D78391 test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 74.00 138.7% test-suite...CFP2000/177.mesa/177.mesa.test 185.00 249.00 34.6% test-suite...rks/FreeBench/mason/mason.test 6.00 8.00 33.3% test-suite...ks/Prolangs-C/agrep/agrep.test 75.00 82.00 9.3% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 47.00 9.3% test-suite...langs-C/football/football.test 116.00 125.00 7.8% test-suite...ications/JM/ldecod/ldecod.test 181.00 195.00 7.7% test-suite...pplications/treecc/treecc.test 80.00 86.00 7.5% test-suite...lications/SIBsim4/SIBsim4.test 18.00 19.00 5.6% test-suite...tions/lambda-0.1.3/lambda.test 23.00 24.00 4.3% test-suite.../CINT2000/175.vpr/175.vpr.test 100.00 104.00 4.0% test-suite...urce/Applications/lua/lua.test 301.00 313.00 4.0% test-suite.../Applications/lemon/lemon.test 30.00 31.00 3.3% test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9% test-suite...lications/ClamAV/clamscan.test 1016.00 1031.00 1.5% The changes between your diff and D78376 are IPNumInstRemoved diff D78376 test-suite...ks/Prolangs-C/agrep/agrep.test 58.00 75.00 29.3% test-suite...cations/hexxagon/hexxagon.test 36.00 42.00 16.7% test-suite...rks/FreeBench/pifft/pifft.test 30.00 32.00 6.7% test-suite...CFP2000/188.ammp/188.ammp.test 72.00 76.00 5.6% test-suite.../CINT2000/254.gap/254.gap.test 171.00 178.00 4.1% test-suite...000/197.parser/197.parser.test 64.00 66.00 3.1% test-suite.../Benchmarks/Bullet/bullet.test 339.00 345.00 1.8% test-suite...TimberWolfMC/timberwolfmc.test 75.00 76.00 1.3% test-suite...pplications/treecc/treecc.test 79.00 80.00 1.3% test-suite.../CINT2000/181.mcf/181.mcf.test 84.00 85.00 1.2% test-suite...CFP2000/177.mesa/177.mesa.test 183.00 185.00 1.1% test-suite.../Applications/sgefa/sgefa.test 115.00 116.00 0.9% test-suite...lications/ClamAV/clamscan.test 1008.00 1016.00 0.8% test-suite...6/482.sphinx3/482.sphinx3.test 141.00 142.00 0.7% test-suite.../CINT2006/403.gcc/403.gcc.test 3994.00 4007.00 0.3% Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order.

nikic added inline comments.Apr 27 2020, 1:45 PM

llvm/include/llvm/Analysis/ValueLattice.h
353	Or rather `MaxRangeExtensions = 1` and then `++NumRangeExtensions > MaxRangeExtensions`. I only now realized that the code as written only allows extending the range once, not twice. Previously it allowed zero extensions.
llvm/test/Transforms/SCCP/constant-range-struct.ll
106	It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there. Yeah, I was definitely too optimistic that the order would match RPO... Now that I think about this again, it probably falls apart with loops. Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order. Thanks for running the numbers! I agree that these issues are complimentary, and allowing the additional range extension makes sense. I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't.

fhahn mentioned this in D79036: [SCCP] Switch to widen at PHIs, stores and call edges..Apr 28 2020, 1:37 PM

It's probably worth exploring D79036 first and then apply this on top of D79036. (it seems like there still are additional cases with wins).

llvm/include/llvm/Analysis/ValueLattice.h
353	Yeah, in my wording I considered setting the range as the 'first' extensions. I agree that has potential for confusion :(. I'll update/clarify that.
llvm/test/Transforms/SCCP/constant-range-struct.ll
106	I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't. Initially it did not matter too much (as nothing besides PHIS and calls had range support) and it probably also causes fewer state-changes as is. Also I did not want to complicate SCCP by adding additional state to track the number of extensions for certain values. I had a look and it seems quite straight forward to do now that ValueLatticeElement tracks the number of extensions though. The question is what sensible limits are for the number of extensions of PHIs (and the other cases which can cause like stores and call edges). For Phis it seems natural to tie the number of extensions to the number of active incoming values (this patch also allows one more, but I have to run the numbers to see if the additional step matters). For calls/stores, it might make sense to tie it to the number of users of the global, but initially I picked an arbitrary value (10). Seems to work quite well. I've put up a patch D79036. It's not quite ready for review yet, I still want to experiment a bit more with different thresholds.

After extensive discussions, it seems like the alternative approach D79036 is slightly beneficial in terms of compile time. Abandoning the patch for now.

fhahn abandoned this revision.May 22 2020, 11:21 AM

fhahn mentioned this in rG01f999ae8871: [SCCP] Switch to widen at PHIs, stores and call edges..May 29 2020, 4:19 AM

Revision Contents

Path

Size

llvm/

include/

llvm/

Analysis/

ValueLattice.h

6 lines

test/

Transforms/

SCCP/

constant-range-struct.ll

12 lines

widening.ll

36 lines

Diff 258379

llvm/include/llvm/Analysis/ValueLattice.h

//===- ValueLattice.h - Value constraint analysis ---------------- C++ --===//		//===- ValueLattice.h - Value constraint analysis ---------------- C++ --===//
		Lint: Lint Inline Actions clang-format not found in user's PATH; not linting file. Lint: Lint: clang-format not found in user's PATH; not linting file.
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

▲ Show 20 Lines • Show All 333 Lines • ▼ Show 20 Lines	bool markConstantRange(ConstantRange NewR,
if (isConstantRange()) {		if (isConstantRange()) {
if (NewR.isEmptySet())		if (NewR.isEmptySet())
return markOverdefined();		return markOverdefined();

Tag = NewTag;		Tag = NewTag;
if (getConstantRange() == NewR)		if (getConstantRange() == NewR)
return Tag != OldTag;		return Tag != OldTag;

// Simple form of widening. If a range is extended multiple times, use the		// Simple form of widening. If a range is more than twice, use the
		grandinjUnsubmitted Done Reply Inline Actions s/more than twice/extended more than twice grandinj: s/more than twice/extended more than twice
// provided WidenCR upper bound if it contains NewR or overdefined		// provided WidenCR upper bound if it contains NewR or overdefined
// otherwise.		// otherwise.
if (Opts.CheckWiden && ++NumRangeExtensions >= 1) {		if (Opts.CheckWiden && ++NumRangeExtensions >= 2) {
		nikicUnsubmitted Not Done Reply Inline Actions Add `MaxRangeExtensions = 2` static const? nikic: Add `MaxRangeExtensions = 2` static const?
		nikicUnsubmitted Not Done Reply Inline Actions Or rather `MaxRangeExtensions = 1` and then `++NumRangeExtensions > MaxRangeExtensions`. I only now realized that the code as written only allows extending the range once, not twice. Previously it allowed zero extensions. nikic: Or rather `MaxRangeExtensions = 1` and then `++NumRangeExtensions > MaxRangeExtensions`. I only…
		fhahnAuthorUnsubmitted Done Reply Inline Actions Yeah, in my wording I considered setting the range as the 'first' extensions. I agree that has potential for confusion :(. I'll update/clarify that. fhahn: Yeah, in my wording I considered setting the range as the 'first' extensions. I agree that has…
// Make sure WidenCR is only used if it contains the merge result.		// Make sure WidenCR is only used if it contains the merge result.
if (!Opts.WidenCR.isFullSet() &&		if (!Opts.WidenCR.isFullSet() &&
(NewR == Opts.WidenCR \|\| Opts.WidenCR.contains(NewR))) {		(NewR == Opts.WidenCR \|\| Opts.WidenCR.contains(NewR))) {
assert(NewR.contains(getConstantRange()));		assert(NewR.contains(getConstantRange()));
bool Changed = Opts.WidenCR != getConstantRange();		bool Changed = Opts.WidenCR != getConstantRange();
Range = std::move(Opts.WidenCR);		Range = std::move(Opts.WidenCR);
NumRangeExtensions = 1;		NumRangeExtensions = 2;
return Changed \|\| Tag != OldTag;		return Changed \|\| Tag != OldTag;
}		}
return markOverdefined();		return markOverdefined();
}		}

assert(NewR.contains(getConstantRange()) &&		assert(NewR.contains(getConstantRange()) &&
"Existing range must be a subset of NewR");		"Existing range must be a subset of NewR");
Range = std::move(NewR);		Range = std::move(NewR);
▲ Show 20 Lines • Show All 113 Lines • Show Last 20 Lines

llvm/test/Transforms/SCCP/constant-range-struct.ll

Show First 20 Lines • Show All 96 Lines • ▼ Show 20 Lines	exit:
ret {i64, i64} %s.2		ret {i64, i64} %s.2
}		}

define void @struct2_caller() {		define void @struct2_caller() {
; CHECK-LABEL: @struct2_caller(		; CHECK-LABEL: @struct2_caller(
; CHECK-NEXT: [[S:%.*]] = call { i64, i64 } @struct2()		; CHECK-NEXT: [[S:%.*]] = call { i64, i64 } @struct2()
; CHECK-NEXT: [[V1:%.*]] = extractvalue { i64, i64 } [[S]], 0		; CHECK-NEXT: [[V1:%.*]] = extractvalue { i64, i64 } [[S]], 0
; CHECK-NEXT: [[V2:%.*]] = extractvalue { i64, i64 } [[S]], 1		; CHECK-NEXT: [[V2:%.*]] = extractvalue { i64, i64 } [[S]], 1
; CHECK-NEXT: [[T_1:%.*]] = icmp ne i64 [[V1]], 10		; CHECK-NEXT: call void @use(i1 true)
; CHECK-NEXT: call void @use(i1 [[T_1]])		; CHECK-NEXT: call void @use(i1 true)
		nikicUnsubmitted Not Done Reply Inline Actions Disclaimer: First time I'm looking at SCCP in LLVM. I feel like we really should be handling this already, without support for multiple widenings (and also, we should handle this for cases with more phi arguments). What happens in `@struct2` is that we first visit the entry block, then the `false` block, then the `exit` block (computing everything there), then the `true` block, then `true -> exit` becomes feasible and we update the phi constant ranges, and then merge in new values for all instructions in `exit`, going to overdefined (without widening). If we had instead visited `entry`, `false`, `true`, `exit`, then not only would we have saved on duplicate computations, we would have also avoided the need to perform widening. I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops). nikic: Disclaimer: First time I'm looking at SCCP in LLVM. I feel like we really should be handling…
		nikicUnsubmitted Not Done Reply Inline Actions Just as a quick experiment, visiting BBs in BFS instead of DFS: https://gist.github.com/nikic/85824a0d14225cd1af5a8b0e25982c0e Some wins, some losses. What we really want is to visit the BBs in RPO order. nikic: Just as a quick experiment, visiting BBs in BFS instead of DFS: https://gist.github.
		fhahnAuthorUnsubmitted Done Reply Inline Actions I feel like the right way to approach this is to fix the block visitation order, and then more widening is something we can do on top of that (which should then only matter for loops). In some cases it boils down to iteration order, yes. What makes things a bit more tricky is that the order of blocks also depends on the current value of the branch conditions (i.e. if we can prove a condition is true/false we don't need to visit one of the predecessors), which in turn depends on the iteration order. While only constants were supported, it did not matter too much, but becomes more relevant with constant ranges. In the example above, we have to execute both true and false straight away, as the condition is overdefined. In other scenarios a different iteration order won't help unfortunately. Granted, RPO would probably be better in general, but more expensive and we probably would have to keep the worklist in RPO order as well. It might be worth a try though (and not matter too much in terms of compile-time). I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap) fhahn: > I feel like the right way to approach this is to fix the block visitation order, and then…
		nikicUnsubmitted Not Done Reply Inline Actions I played around with this a bit more. I think making use of RPO proper may be non-trivial for the interprocedural case. I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO. I'll put something together, but the widening extension and iteration order address different issues (although there is some overlap) I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis. Ideally SCCP as an algorithm should not depend on iteration order (outside of performance concerns), but I can see how this is hard to realize once ranges are involved. nikic: I played around with this a bit more. I think making use of RPO proper may be non-trivial for…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I came up with this hack that prefers visiting blocks where all predecessors are known first: https://gist.github.com/nikic/ab0222b3b0b37b73f96d9a1d47bdd100 I think excluding degenerate cases with a priori unreachable blocks (not blocks found unreachable by SCCP), this should give the same result as visiting in RPO. Interesting, thanks for sharing! I gave it a try and it seems to give some nice improvements in some cases, but there are a few notable regressions below (I'm running -O3 LTO for MultiSource & SPEC) It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there. IPNumInstRemoved regressions test-suite...peg2/mpeg2dec/mpeg2decode.test 70.00 31.00 -55.7% test-suite...CFP2000/177.mesa/177.mesa.test 202.00 183.00 -9.4% NumInstRemoved test-suite...CFP2000/177.mesa/177.mesa.test 1862.00 839.00 -54.9% test-suite...marks/7zip/7zip-benchmark.test 1445.00 1412.00 -2.3% test-suite.../CINT2000/176.gcc/176.gcc.test 3092.00 3060.00 -1.0% I do understand that these address different issues, but I suspect that most of the benefit you're seeing on test-suite is not due to the benefits of widening itself, which should really only manifest in loops, but rather due to avoiding the iteration order issue for the common case of two argument phis. I collected numbers with your diff as baseline, followed by D78376, followed by this patch. The changes between D78376 and this patch are as follows and look like they are in the same ball-park, presumably covering mostly different/additional cases. (for the mpeg2decode.test case it mostly restores the regressions with the baseline): IPNumInstRemoved D78376. D78391 test-suite...peg2/mpeg2dec/mpeg2decode.test 31.00 74.00 138.7% test-suite...CFP2000/177.mesa/177.mesa.test 185.00 249.00 34.6% test-suite...rks/FreeBench/mason/mason.test 6.00 8.00 33.3% test-suite...ks/Prolangs-C/agrep/agrep.test 75.00 82.00 9.3% test-suite...oxyApps-C++/miniFE/miniFE.test 43.00 47.00 9.3% test-suite...langs-C/football/football.test 116.00 125.00 7.8% test-suite...ications/JM/ldecod/ldecod.test 181.00 195.00 7.7% test-suite...pplications/treecc/treecc.test 80.00 86.00 7.5% test-suite...lications/SIBsim4/SIBsim4.test 18.00 19.00 5.6% test-suite...tions/lambda-0.1.3/lambda.test 23.00 24.00 4.3% test-suite.../CINT2000/175.vpr/175.vpr.test 100.00 104.00 4.0% test-suite...urce/Applications/lua/lua.test 301.00 313.00 4.0% test-suite.../Applications/lemon/lemon.test 30.00 31.00 3.3% test-suite...eeBench/analyzer/analyzer.test 103.00 105.00 1.9% test-suite...lications/ClamAV/clamscan.test 1016.00 1031.00 1.5% The changes between your diff and D78376 are IPNumInstRemoved diff D78376 test-suite...ks/Prolangs-C/agrep/agrep.test 58.00 75.00 29.3% test-suite...cations/hexxagon/hexxagon.test 36.00 42.00 16.7% test-suite...rks/FreeBench/pifft/pifft.test 30.00 32.00 6.7% test-suite...CFP2000/188.ammp/188.ammp.test 72.00 76.00 5.6% test-suite.../CINT2000/254.gap/254.gap.test 171.00 178.00 4.1% test-suite...000/197.parser/197.parser.test 64.00 66.00 3.1% test-suite.../Benchmarks/Bullet/bullet.test 339.00 345.00 1.8% test-suite...TimberWolfMC/timberwolfmc.test 75.00 76.00 1.3% test-suite...pplications/treecc/treecc.test 79.00 80.00 1.3% test-suite.../CINT2000/181.mcf/181.mcf.test 84.00 85.00 1.2% test-suite...CFP2000/177.mesa/177.mesa.test 183.00 185.00 1.1% test-suite.../Applications/sgefa/sgefa.test 115.00 116.00 0.9% test-suite...lications/ClamAV/clamscan.test 1008.00 1016.00 0.8% test-suite...6/482.sphinx3/482.sphinx3.test 141.00 142.00 0.7% test-suite.../CINT2006/403.gcc/403.gcc.test 3994.00 4007.00 0.3% Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order. fhahn: > I came up with this hack that prefers visiting blocks where all predecessors are known first…
		nikicUnsubmitted Not Done Reply Inline Actions It's definitely an interesting direction to explore, but I think it we would have to take a look at the regressions to understand what's going wrong there. Yeah, I was definitely too optimistic that the order would match RPO... Now that I think about this again, it probably falls apart with loops. Overall it looks like there are plenty of further improvements down the road :) And I think changes to the iteration order are mostly complementary to this patch. I definitely need to update the tests for this patch though, with more complex cases that are not also solved by changes to the iteration order. Thanks for running the numbers! I agree that these issues are complimentary, and allowing the additional range extension makes sense. I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't. nikic: > It's definitely an interesting direction to explore, but I think it we would have to take a…
		fhahnAuthorUnsubmitted Done Reply Inline Actions I'm also wondering whether it would make sense to invert the current relationship, where phis can be arbitrarily widened, while for everything else widening is limited. Based on a quick prototype (https://gist.github.com/nikic/885119c67456a0d607db730af744e25b) this is sometimes better, sometimes worse (for MaxWidenings=0). I'm not really clear on what the tradeoff here is, regarding the position where we allow widening and where we don't. Initially it did not matter too much (as nothing besides PHIS and calls had range support) and it probably also causes fewer state-changes as is. Also I did not want to complicate SCCP by adding additional state to track the number of extensions for certain values. I had a look and it seems quite straight forward to do now that ValueLatticeElement tracks the number of extensions though. The question is what sensible limits are for the number of extensions of PHIs (and the other cases which can cause like stores and call edges). For Phis it seems natural to tie the number of extensions to the number of active incoming values (this patch also allows one more, but I have to run the numbers to see if the additional step matters). For calls/stores, it might make sense to tie it to the number of users of the global, but initially I picked an arbitrary value (10). Seems to work quite well. I've put up a patch D79036. It's not quite ready for review yet, I still want to experiment a bit more with different thresholds. fhahn: > I'm also wondering whether it would make sense to invert the current relationship, where phis…
; CHECK-NEXT: [[T_2:%.*]] = icmp ult i64 [[V1]], 100
; CHECK-NEXT: call void @use(i1 [[T_2]])
; CHECK-NEXT: [[T_3:%.*]] = icmp ne i64 [[V2]], 0		; CHECK-NEXT: [[T_3:%.*]] = icmp ne i64 [[V2]], 0
; CHECK-NEXT: call void @use(i1 [[T_3]])		; CHECK-NEXT: call void @use(i1 [[T_3]])
; CHECK-NEXT: [[T_4:%.*]] = icmp ult i64 [[V2]], 301		; CHECK-NEXT: [[T_4:%.*]] = icmp ult i64 [[V2]], 301
; CHECK-NEXT: call void @use(i1 [[T_4]])		; CHECK-NEXT: call void @use(i1 [[T_4]])
; CHECK-NEXT: [[F_1:%.*]] = icmp eq i64 [[V1]], 10		; CHECK-NEXT: call void @use(i1 false)
; CHECK-NEXT: call void @use(i1 [[F_1]])		; CHECK-NEXT: call void @use(i1 false)
; CHECK-NEXT: [[F_2:%.*]] = icmp ult i64 [[V1]], 19
; CHECK-NEXT: call void @use(i1 [[F_2]])
; CHECK-NEXT: [[F_3:%.*]] = icmp eq i64 [[V2]], 50		; CHECK-NEXT: [[F_3:%.*]] = icmp eq i64 [[V2]], 50
; CHECK-NEXT: call void @use(i1 [[F_3]])		; CHECK-NEXT: call void @use(i1 [[F_3]])
; CHECK-NEXT: [[F_4:%.*]] = icmp ugt i64 [[V2]], 301		; CHECK-NEXT: [[F_4:%.*]] = icmp ugt i64 [[V2]], 301
; CHECK-NEXT: call void @use(i1 [[F_4]])		; CHECK-NEXT: call void @use(i1 [[F_4]])
; CHECK-NEXT: [[C_1:%.*]] = icmp eq i64 [[V1]], 25		; CHECK-NEXT: [[C_1:%.*]] = icmp eq i64 [[V1]], 25
; CHECK-NEXT: call void @use(i1 [[C_1]])		; CHECK-NEXT: call void @use(i1 [[C_1]])
; CHECK-NEXT: [[C_2:%.*]] = icmp ult i64 [[V1]], 25		; CHECK-NEXT: [[C_2:%.*]] = icmp ult i64 [[V1]], 25
; CHECK-NEXT: call void @use(i1 [[C_2]])		; CHECK-NEXT: call void @use(i1 [[C_2]])
Show All 40 Lines

llvm/test/Transforms/SCCP/widening.ll

	Show All 11 Lines
	; SCCP-NEXT: entry:			; SCCP-NEXT: entry:
	; SCCP-NEXT: [[C_1:%.*]] = call i1 @cond()			; SCCP-NEXT: [[C_1:%.*]] = call i1 @cond()
	; SCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]			; SCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]
	; SCCP: bb1:			; SCCP: bb1:
	; SCCP-NEXT: br label [[EXIT]]			; SCCP-NEXT: br label [[EXIT]]
	; SCCP: exit:			; SCCP: exit:
	; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ]			; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ]
	; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; SCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; SCCP-NEXT: call void @use(i1 true)
	; SCCP-NEXT: call void @use(i1 [[T_1]])			; SCCP-NEXT: call void @use(i1 false)
	; SCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; SCCP-NEXT: call void @use(i1 [[F_1]])
	; SCCP-NEXT: ret void			; SCCP-NEXT: ret void
	;			;
	; IPSCCP-LABEL: @test_2_incoming_constants(			; IPSCCP-LABEL: @test_2_incoming_constants(
	; IPSCCP-NEXT: entry:			; IPSCCP-NEXT: entry:
	; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]			; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]
	; IPSCCP: bb1:			; IPSCCP: bb1:
	; IPSCCP-NEXT: br label [[EXIT]]			; IPSCCP-NEXT: br label [[EXIT]]
	; IPSCCP: exit:			; IPSCCP: exit:
	; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ]			; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ]
	; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; IPSCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; IPSCCP-NEXT: call void @use(i1 true)
	; IPSCCP-NEXT: call void @use(i1 [[T_1]])			; IPSCCP-NEXT: call void @use(i1 false)
	; IPSCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; IPSCCP-NEXT: call void @use(i1 [[F_1]])
	; IPSCCP-NEXT: ret void			; IPSCCP-NEXT: ret void
	;			;
	entry:			entry:
	%c.1 = call i1 @cond()			%c.1 = call i1 @cond()
	br i1 %c.1, label %bb1, label %exit			br i1 %c.1, label %bb1, label %exit

	bb1:			bb1:
	br label %exit			br label %exit
	Show All 16 Lines
	; SCCP: bb1:			; SCCP: bb1:
	; SCCP-NEXT: [[C_2:%.*]] = call i1 @cond()			; SCCP-NEXT: [[C_2:%.*]] = call i1 @cond()
	; SCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]			; SCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]
	; SCCP: bb2:			; SCCP: bb2:
	; SCCP-NEXT: br label [[EXIT]]			; SCCP-NEXT: br label [[EXIT]]
	; SCCP: exit:			; SCCP: exit:
	; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ]			; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ]
	; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; SCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; SCCP-NEXT: call void @use(i1 true)
	; SCCP-NEXT: call void @use(i1 [[T_1]])			; SCCP-NEXT: call void @use(i1 false)
	; SCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; SCCP-NEXT: call void @use(i1 [[F_1]])
	; SCCP-NEXT: ret void			; SCCP-NEXT: ret void
	;			;
	; IPSCCP-LABEL: @test_3_incoming_constants(			; IPSCCP-LABEL: @test_3_incoming_constants(
	; IPSCCP-NEXT: entry:			; IPSCCP-NEXT: entry:
	; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]			; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]
	; IPSCCP: bb1:			; IPSCCP: bb1:
	; IPSCCP-NEXT: [[C_2:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_2:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]			; IPSCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]
	; IPSCCP: bb2:			; IPSCCP: bb2:
	; IPSCCP-NEXT: br label [[EXIT]]			; IPSCCP-NEXT: br label [[EXIT]]
	; IPSCCP: exit:			; IPSCCP: exit:
	; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ]			; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ]
	; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; IPSCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; IPSCCP-NEXT: call void @use(i1 true)
	; IPSCCP-NEXT: call void @use(i1 [[T_1]])			; IPSCCP-NEXT: call void @use(i1 false)
	; IPSCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; IPSCCP-NEXT: call void @use(i1 [[F_1]])
	; IPSCCP-NEXT: ret void			; IPSCCP-NEXT: ret void
	;			;
	entry:			entry:
	%c.1 = call i1 @cond()			%c.1 = call i1 @cond()
	br i1 %c.1, label %bb1, label %exit			br i1 %c.1, label %bb1, label %exit

	bb1:			bb1:
	%c.2 = call i1 @cond()			%c.2 = call i1 @cond()
	Show All 26 Lines
	; SCCP: bb3:			; SCCP: bb3:
	; SCCP-NEXT: [[C_4:%.*]] = call i1 @cond()			; SCCP-NEXT: [[C_4:%.*]] = call i1 @cond()
	; SCCP-NEXT: br i1 [[C_4]], label [[BB4:%.*]], label [[EXIT]]			; SCCP-NEXT: br i1 [[C_4]], label [[BB4:%.*]], label [[EXIT]]
	; SCCP: bb4:			; SCCP: bb4:
	; SCCP-NEXT: br label [[EXIT]]			; SCCP-NEXT: br label [[EXIT]]
	; SCCP: exit:			; SCCP: exit:
	; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ], [ 3, [[BB3]] ], [ 4, [[BB4]] ]			; SCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ], [ 3, [[BB3]] ], [ 4, [[BB4]] ]
	; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; SCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; SCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; SCCP-NEXT: call void @use(i1 true)
	; SCCP-NEXT: call void @use(i1 [[T_1]])			; SCCP-NEXT: call void @use(i1 false)
	; SCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; SCCP-NEXT: call void @use(i1 [[F_1]])
	; SCCP-NEXT: ret void			; SCCP-NEXT: ret void
	;			;
	; IPSCCP-LABEL: @test_5_incoming_constants(			; IPSCCP-LABEL: @test_5_incoming_constants(
	; IPSCCP-NEXT: entry:			; IPSCCP-NEXT: entry:
	; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_1:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]			; IPSCCP-NEXT: br i1 [[C_1]], label [[BB1:%.]], label [[EXIT:%.]]
	; IPSCCP: bb1:			; IPSCCP: bb1:
	; IPSCCP-NEXT: [[C_2:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_2:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]			; IPSCCP-NEXT: br i1 [[C_2]], label [[BB2:%.*]], label [[EXIT]]
	; IPSCCP: bb2:			; IPSCCP: bb2:
	; IPSCCP-NEXT: [[C_3:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_3:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_3]], label [[BB3:%.*]], label [[EXIT]]			; IPSCCP-NEXT: br i1 [[C_3]], label [[BB3:%.*]], label [[EXIT]]
	; IPSCCP: bb3:			; IPSCCP: bb3:
	; IPSCCP-NEXT: [[C_4:%.*]] = call i1 @cond()			; IPSCCP-NEXT: [[C_4:%.*]] = call i1 @cond()
	; IPSCCP-NEXT: br i1 [[C_4]], label [[BB4:%.*]], label [[EXIT]]			; IPSCCP-NEXT: br i1 [[C_4]], label [[BB4:%.*]], label [[EXIT]]
	; IPSCCP: bb4:			; IPSCCP: bb4:
	; IPSCCP-NEXT: br label [[EXIT]]			; IPSCCP-NEXT: br label [[EXIT]]
	; IPSCCP: exit:			; IPSCCP: exit:
	; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ], [ 3, [[BB3]] ], [ 4, [[BB4]] ]			; IPSCCP-NEXT: [[P:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[BB1]] ], [ 2, [[BB2]] ], [ 3, [[BB3]] ], [ 4, [[BB4]] ]
	; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1			; IPSCCP-NEXT: [[A:%.*]] = add i32 [[P]], 1
	; IPSCCP-NEXT: [[T_1:%.*]] = icmp ult i32 [[A]], 20			; IPSCCP-NEXT: call void @use(i1 true)
	; IPSCCP-NEXT: call void @use(i1 [[T_1]])			; IPSCCP-NEXT: call void @use(i1 false)
	; IPSCCP-NEXT: [[F_1:%.*]] = icmp ugt i32 [[A]], 10
	; IPSCCP-NEXT: call void @use(i1 [[F_1]])
	; IPSCCP-NEXT: ret void			; IPSCCP-NEXT: ret void
	;			;
	entry:			entry:
	%c.1 = call i1 @cond()			%c.1 = call i1 @cond()
	br i1 %c.1, label %bb1, label %exit			br i1 %c.1, label %bb1, label %exit

	bb1:			bb1:
	%c.2 = call i1 @cond()			%c.2 = call i1 @cond()
	▲ Show 20 Lines • Show All 709 Lines • Show Last 20 Lines