This is an archive of the discontinued LLVM Phabricator instance.

Reorder SimplifyCFG and SROA?
AbandonedPublic

Authored by tjablin on Jun 13 2016, 4:31 PM.

Download Raw Diff

Details

Reviewers

chandlerc
cycheng
hfinkel

Summary

Hi,
I'd like to propose changing the order of the SimplifyCFG and SROA passes run. There are optimizations in SimplifyCFG that are enabled by SROA. Presently, SimplifyCFG is run before and after SROA, but some less advantageous SimplifyCFG transformations can run in the first pass and deny better SimplifyCFG transformations a chance. Changing the order of these two passes will resolve performance bug 27555 (https://llvm.org/bugs/show_bug.cgi?id=27555). I don't know how the ordering of these two passes was decided initially, so I am hoping whether this order was deliberate.

Specifically, there are two functions in SimplifyCFG and deal with long sequences of comparisons: FoldValueComparisonIntoPredecessors and FoldingBranchToCommonDest. These two transformations have overlapping opportunities, but when both are applicable, FoldValueComparisonIntoPredecessors is typically better. Presently, even though FoldValueComparisonIntoPredecessors runs before FoldingBranchToCommonDest, FoldingBranchToCommonDest wins, because FoldValueComparisonIntoPredecessors needs to run after values are promoted to registers, whereas FoldingBranchToCommonDest can run before.

FoldingBranchToCommonDest transforms code that looks like:

if (A) goto LabelX
if (B) goto LabelX
if (C) goto LabelX
...
goto LabelY

into:

if(A || B || C || ...) goto LabelX
goto LabelY

where A, B, C, etc are non-side-effecting expressions.

FoldValueComparisonIntoPredecessors transforms code that looks like:

if (x == A) goto LabelX
if (x == B) goto LabelY
if (x == C) goto LabelZ
...

into:
select(x) {

case A: goto LabelX
case B: goto LabelY
case C: goto LabelZ
...

}

where A, B, C, etc are constant values.

In situations where both transformations apply, FoldValueComparisonIntoPredecessors tends to be better, since LLVM can lower the Select more efficiently than the complex boolean expression generated by FoldingBranchToCommonDest.

Diff Detail

Event Timeline

tjablin updated this revision to Diff 60625.Jun 13 2016, 4:31 PM

tjablin retitled this revision from to Reorder SimplifyCFG and SROA?.

tjablin updated this object.

tjablin added reviewers: hfinkel, cycheng, chandlerc.

tjablin added a subscriber: llvm-commits.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptJun 13 2016, 4:31 PM

Rather than change the ordering, what about teaching SimplifyCFG's FoldValueComparisonIntoPredecessors logic to handle code in the pattern produced by FoldingBranchToCommonDest? That would seem like a good canonicalization change anyways.

Hi Chandler,
The FoldValueComparisonIntoPredecessors logic already mostly understands the code pattern produced by FoldingBranchToCommonDest, but would need to duplicate most of the logic from Early-CSE to get the rest of the way there. The current pass order is: CFGSimplification, SROA, EarlyCSE. If either EarlyCSE or SROA runs before SimplifyCFG there's no problem. Alternatively, a second pass through SimplifyCFG will also generate good code as long as it is before InstCombine. InstCombine is problematic since it "strength reduces" some equality comparisons to bitwise operations. For example:

(i == 5334 || i == 5335)

becomes

((i | 1) == 5335)

In D21315#457672, @tjablin wrote:
Hi Chandler,
The FoldValueComparisonIntoPredecessors logic already mostly understands the code pattern produced by FoldingBranchToCommonDest, but would need to duplicate most of the logic from Early-CSE to get the rest of the way there. The current pass order is: CFGSimplification, SROA, EarlyCSE. If either EarlyCSE or SROA runs before SimplifyCFG there's no problem. Alternatively, a second pass through SimplifyCFG will also generate good code as long as it is before InstCombine. InstCombine is problematic since it "strength reduces" some equality comparisons to bitwise operations. For example:
(i == 5334 || i == 5335)
becomes
((i | 1) == 5335)

If this is how the equality expression is canonicalized by instcombine, then this is how simplify-cfg needs to match it.

If there is logic elsewhere that should be commonly factored out to help build the equality set out of this reduced form, we should extract it and use it in both places.

If this simply cannot be reasonably matched afterward, then instcombine is destroying information rather than canonicalizing, and we should teach it to use a different canonical form in order to make matching on these patterns easier.

Did you run this through the test-suite?

test/Transforms/PhaseOrdering/branch-to-switch.ll
6	What this test is checking is totally unclear, can you document it please?

tjablin abandoned this revision.Jun 15 2016, 12:41 PM

tjablin mentioned this in D21397: Teach SimplifyCFG to Create Switches from InstCombine Or Mask'd Comparisons.

cycheng mentioned this in rL273639: Teaching SimplifyCFG to recognize the Or-Mask trick that InstCombine uses to.Jun 23 2016, 7:06 PM

Revision Contents

Path

Size

lib/

Transforms/

IPO/

PassManagerBuilder.cpp

2 lines

test/

Transforms/

PhaseOrdering/

branch-to-switch.ll

219 lines

Diff 60625

lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateFunctionPassManager(
// Add LibraryInfo if we have some.		// Add LibraryInfo if we have some.
if (LibraryInfo)		if (LibraryInfo)
FPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));		FPM.add(new TargetLibraryInfoWrapperPass(*LibraryInfo));

if (OptLevel == 0) return;		if (OptLevel == 0) return;

addInitialAliasAnalysisPasses(FPM);		addInitialAliasAnalysisPasses(FPM);

FPM.add(createCFGSimplificationPass());
if (UseNewSROA)		if (UseNewSROA)
FPM.add(createSROAPass());		FPM.add(createSROAPass());
else		else
FPM.add(createScalarReplAggregatesPass());		FPM.add(createScalarReplAggregatesPass());
		FPM.add(createCFGSimplificationPass());
FPM.add(createEarlyCSEPass());		FPM.add(createEarlyCSEPass());
FPM.add(createLowerExpectIntrinsicPass());		FPM.add(createLowerExpectIntrinsicPass());
}		}

// Do PGO instrumentation generation or use pass as the option specified.		// Do PGO instrumentation generation or use pass as the option specified.
void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {		void PassManagerBuilder::addPGOInstrPasses(legacy::PassManagerBase &MPM) {
if (!PGOInstrGen.empty()) {		if (!PGOInstrGen.empty()) {
MPM.add(createPGOInstrumentationGenLegacyPass());		MPM.add(createPGOInstrumentationGenLegacyPass());
▲ Show 20 Lines • Show All 643 Lines • Show Last 20 Lines

test/Transforms/PhaseOrdering/branch-to-switch.ll

				; RUN: opt -O1 -S < %s \| FileCheck %s

				; CHECK-LABEL: @foo
				; CHECK: switch i32 %i
				; CHECK: br
				; CHECK-NOT: br
				mehdi_aminiUnsubmitted Not Done Reply Inline Actions What this test is checking is totally unclear, can you document it please? mehdi_amini: What this test is checking is totally unclear, can you document it please?
				define signext i32 @foo(i32 signext %i) {
				entry:
				%i.addr = alloca i32, align 4
				store i32 %i, i32* %i.addr, align 4
				%0 = load i32, i32* %i.addr, align 4
				%cmp = icmp eq i32 %0, 5548
				br i1 %cmp, label %lor.end, label %lor.lhs.false

				lor.lhs.false: ; preds = %entry
				%1 = load i32, i32* %i.addr, align 4
				%cmp1 = icmp eq i32 %1, 6374
				br i1 %cmp1, label %lor.end, label %lor.lhs.false2

				lor.lhs.false2: ; preds = %lor.lhs.false
				%2 = load i32, i32* %i.addr, align 4
				%cmp3 = icmp eq i32 %2, 5595
				br i1 %cmp3, label %lor.end, label %lor.lhs.false4

				lor.lhs.false4: ; preds = %lor.lhs.false2
				%3 = load i32, i32* %i.addr, align 4
				%cmp5 = icmp eq i32 %3, 8625
				br i1 %cmp5, label %lor.end, label %lor.lhs.false6

				lor.lhs.false6: ; preds = %lor.lhs.false4
				%4 = load i32, i32* %i.addr, align 4
				%cmp7 = icmp eq i32 %4, 8621
				br i1 %cmp7, label %lor.end, label %lor.lhs.false8

				lor.lhs.false8: ; preds = %lor.lhs.false6
				%5 = load i32, i32* %i.addr, align 4
				%cmp9 = icmp eq i32 %5, 8622
				br i1 %cmp9, label %lor.end, label %lor.lhs.false10

				lor.lhs.false10: ; preds = %lor.lhs.false8
				%6 = load i32, i32* %i.addr, align 4
				%cmp11 = icmp eq i32 %6, 6373
				br i1 %cmp11, label %lor.end, label %lor.lhs.false12

				lor.lhs.false12: ; preds = %lor.lhs.false10
				%7 = load i32, i32* %i.addr, align 4
				%cmp13 = icmp eq i32 %7, 6073
				br i1 %cmp13, label %lor.end, label %lor.lhs.false14

				lor.lhs.false14: ; preds = %lor.lhs.false12
				%8 = load i32, i32* %i.addr, align 4
				%cmp15 = icmp eq i32 %8, 5568
				br i1 %cmp15, label %lor.end, label %lor.lhs.false16

				lor.lhs.false16: ; preds = %lor.lhs.false14
				%9 = load i32, i32* %i.addr, align 4
				%cmp17 = icmp eq i32 %9, 5549
				br i1 %cmp17, label %lor.end, label %lor.lhs.false18

				lor.lhs.false18: ; preds = %lor.lhs.false16
				%10 = load i32, i32* %i.addr, align 4
				%cmp19 = icmp eq i32 %10, 8623
				br i1 %cmp19, label %lor.end, label %lor.lhs.false20

				lor.lhs.false20: ; preds = %lor.lhs.false18
				%11 = load i32, i32* %i.addr, align 4
				%cmp21 = icmp eq i32 %11, 8624
				br i1 %cmp21, label %lor.end, label %lor.lhs.false22

				lor.lhs.false22: ; preds = %lor.lhs.false20
				%12 = load i32, i32* %i.addr, align 4
				%cmp23 = icmp eq i32 %12, 5569
				br i1 %cmp23, label %lor.end, label %lor.lhs.false24

				lor.lhs.false24: ; preds = %lor.lhs.false22
				%13 = load i32, i32* %i.addr, align 4
				%cmp25 = icmp eq i32 %13, 5544
				br i1 %cmp25, label %lor.end, label %lor.lhs.false26

				lor.lhs.false26: ; preds = %lor.lhs.false24
				%14 = load i32, i32* %i.addr, align 4
				%cmp27 = icmp eq i32 %14, 7727
				br i1 %cmp27, label %lor.end, label %lor.lhs.false28

				lor.lhs.false28: ; preds = %lor.lhs.false26
				%15 = load i32, i32* %i.addr, align 4
				%cmp29 = icmp eq i32 %15, 7746
				br i1 %cmp29, label %lor.end, label %lor.lhs.false30

				lor.lhs.false30: ; preds = %lor.lhs.false28
				%16 = load i32, i32* %i.addr, align 4
				%cmp31 = icmp eq i32 %16, 5570
				br i1 %cmp31, label %lor.end, label %lor.lhs.false32

				lor.lhs.false32: ; preds = %lor.lhs.false30
				%17 = load i32, i32* %i.addr, align 4
				%cmp33 = icmp eq i32 %17, 5543
				br i1 %cmp33, label %lor.end, label %lor.lhs.false34

				lor.lhs.false34: ; preds = %lor.lhs.false32
				%18 = load i32, i32* %i.addr, align 4
				%cmp35 = icmp eq i32 %18, 7728
				br i1 %cmp35, label %lor.end, label %lor.lhs.false36

				lor.lhs.false36: ; preds = %lor.lhs.false34
				%19 = load i32, i32* %i.addr, align 4
				%cmp37 = icmp eq i32 %19, 7747
				br i1 %cmp37, label %lor.end, label %lor.lhs.false38

				lor.lhs.false38: ; preds = %lor.lhs.false36
				%20 = load i32, i32* %i.addr, align 4
				%cmp39 = icmp eq i32 %20, 6026
				br i1 %cmp39, label %lor.end, label %lor.lhs.false40

				lor.lhs.false40: ; preds = %lor.lhs.false38
				%21 = load i32, i32* %i.addr, align 4
				%cmp41 = icmp eq i32 %21, 6027
				br i1 %cmp41, label %lor.end, label %lor.lhs.false42

				lor.lhs.false42: ; preds = %lor.lhs.false40
				%22 = load i32, i32* %i.addr, align 4
				%cmp43 = icmp eq i32 %22, 374
				br i1 %cmp43, label %lor.end, label %lor.lhs.false44

				lor.lhs.false44: ; preds = %lor.lhs.false42
				%23 = load i32, i32* %i.addr, align 4
				%cmp45 = icmp eq i32 %23, 5333
				br i1 %cmp45, label %lor.end, label %lor.lhs.false46

				lor.lhs.false46: ; preds = %lor.lhs.false44
				%24 = load i32, i32* %i.addr, align 4
				%cmp47 = icmp eq i32 %24, 5332
				br i1 %cmp47, label %lor.end, label %lor.lhs.false48

				lor.lhs.false48: ; preds = %lor.lhs.false46
				%25 = load i32, i32* %i.addr, align 4
				%cmp49 = icmp eq i32 %25, 1027
				br i1 %cmp49, label %lor.end, label %lor.lhs.false50

				lor.lhs.false50: ; preds = %lor.lhs.false48
				%26 = load i32, i32* %i.addr, align 4
				%cmp51 = icmp eq i32 %26, 5337
				br i1 %cmp51, label %lor.end, label %lor.lhs.false52

				lor.lhs.false52: ; preds = %lor.lhs.false50
				%27 = load i32, i32* %i.addr, align 4
				%cmp53 = icmp eq i32 %27, 5336
				br i1 %cmp53, label %lor.end, label %lor.lhs.false54

				lor.lhs.false54: ; preds = %lor.lhs.false52
				%28 = load i32, i32* %i.addr, align 4
				%cmp55 = icmp eq i32 %28, 8781
				br i1 %cmp55, label %lor.end, label %lor.lhs.false56

				lor.lhs.false56: ; preds = %lor.lhs.false54
				%29 = load i32, i32* %i.addr, align 4
				%cmp57 = icmp eq i32 %29, 8783
				br i1 %cmp57, label %lor.end, label %lor.lhs.false58

				lor.lhs.false58: ; preds = %lor.lhs.false56
				%30 = load i32, i32* %i.addr, align 4
				%cmp59 = icmp eq i32 %30, 8782
				br i1 %cmp59, label %lor.end, label %lor.lhs.false60

				lor.lhs.false60: ; preds = %lor.lhs.false58
				%31 = load i32, i32* %i.addr, align 4
				%cmp61 = icmp eq i32 %31, 8784
				br i1 %cmp61, label %lor.end, label %lor.lhs.false62

				lor.lhs.false62: ; preds = %lor.lhs.false60
				%32 = load i32, i32* %i.addr, align 4
				%cmp63 = icmp eq i32 %32, 2347
				br i1 %cmp63, label %lor.end, label %lor.lhs.false64

				lor.lhs.false64: ; preds = %lor.lhs.false62
				%33 = load i32, i32* %i.addr, align 4
				%cmp65 = icmp eq i32 %33, 5339
				br i1 %cmp65, label %lor.end, label %lor.lhs.false66

				lor.lhs.false66: ; preds = %lor.lhs.false64
				%34 = load i32, i32* %i.addr, align 4
				%cmp67 = icmp eq i32 %34, 5338
				br i1 %cmp67, label %lor.end, label %lor.lhs.false68

				lor.lhs.false68: ; preds = %lor.lhs.false66
				%35 = load i32, i32* %i.addr, align 4
				%cmp69 = icmp eq i32 %35, 3856
				br i1 %cmp69, label %lor.end, label %lor.lhs.false70

				lor.lhs.false70: ; preds = %lor.lhs.false68
				%36 = load i32, i32* %i.addr, align 4
				%cmp71 = icmp eq i32 %36, 5335
				br i1 %cmp71, label %lor.end, label %lor.lhs.false72

				lor.lhs.false72: ; preds = %lor.lhs.false70
				%37 = load i32, i32* %i.addr, align 4
				%cmp73 = icmp eq i32 %37, 5334
				br i1 %cmp73, label %lor.end, label %lor.lhs.false74

				lor.lhs.false74: ; preds = %lor.lhs.false72
				%38 = load i32, i32* %i.addr, align 4
				%cmp75 = icmp eq i32 %38, 5343
				br i1 %cmp75, label %lor.end, label %lor.lhs.false76

				lor.lhs.false76: ; preds = %lor.lhs.false74
				%39 = load i32, i32* %i.addr, align 4
				%cmp77 = icmp eq i32 %39, 5342
				br i1 %cmp77, label %lor.end, label %lor.rhs

				lor.rhs: ; preds = %lor.lhs.false76
				%40 = load i32, i32* %i.addr, align 4
				%cmp78 = icmp eq i32 %40, 4775
				br label %lor.end

				lor.end: ; preds = %lor.rhs, %lor.lhs.false76, %lor.lhs.false74, %lor.lhs.false72, %lor.lhs.false70, %lor.lhs.false68, %lor.lhs.false66, %lor.lhs.false64, %lor.lhs.false62, %lor.lhs.false60, %lor.lhs.false58, %lor.lhs.false56, %lor.lhs.false54, %lor.lhs.false52, %lor.lhs.false50, %lor.lhs.false48, %lor.lhs.false46, %lor.lhs.false44, %lor.lhs.false42, %lor.lhs.false40, %lor.lhs.false38, %lor.lhs.false36, %lor.lhs.false34, %lor.lhs.false32, %lor.lhs.false30, %lor.lhs.false28, %lor.lhs.false26, %lor.lhs.false24, %lor.lhs.false22, %lor.lhs.false20, %lor.lhs.false18, %lor.lhs.false16, %lor.lhs.false14, %lor.lhs.false12, %lor.lhs.false10, %lor.lhs.false8, %lor.lhs.false6, %lor.lhs.false4, %lor.lhs.false2, %lor.lhs.false, %entry
				%41 = phi i1 [ true, %lor.lhs.false76 ], [ true, %lor.lhs.false74 ], [ true, %lor.lhs.false72 ], [ true, %lor.lhs.false70 ], [ true, %lor.lhs.false68 ], [ true, %lor.lhs.false66 ], [ true, %lor.lhs.false64 ], [ true, %lor.lhs.false62 ], [ true, %lor.lhs.false60 ], [ true, %lor.lhs.false58 ], [ true, %lor.lhs.false56 ], [ true, %lor.lhs.false54 ], [ true, %lor.lhs.false52 ], [ true, %lor.lhs.false50 ], [ true, %lor.lhs.false48 ], [ true, %lor.lhs.false46 ], [ true, %lor.lhs.false44 ], [ true, %lor.lhs.false42 ], [ true, %lor.lhs.false40 ], [ true, %lor.lhs.false38 ], [ true, %lor.lhs.false36 ], [ true, %lor.lhs.false34 ], [ true, %lor.lhs.false32 ], [ true, %lor.lhs.false30 ], [ true, %lor.lhs.false28 ], [ true, %lor.lhs.false26 ], [ true, %lor.lhs.false24 ], [ true, %lor.lhs.false22 ], [ true, %lor.lhs.false20 ], [ true, %lor.lhs.false18 ], [ true, %lor.lhs.false16 ], [ true, %lor.lhs.false14 ], [ true, %lor.lhs.false12 ], [ true, %lor.lhs.false10 ], [ true, %lor.lhs.false8 ], [ true, %lor.lhs.false6 ], [ true, %lor.lhs.false4 ], [ true, %lor.lhs.false2 ], [ true, %lor.lhs.false ], [ true, %entry ], [ %cmp78, %lor.rhs ]
				%lor.ext = zext i1 %41 to i32
				ret i32 %lor.ext
				}
				No newline at end of file