This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Transforms/Scalar/
-
Transforms/
-
Scalar/
-
SimplifyCFGPass.cpp
-
test/
-
CodeGen/Thumb2/
-
Thumb2/
-
setjmp_longjmp.ll
-
Transforms/SimplifyCFG/
-
SimplifyCFG/
17/19
tail-merge-noreturn.ll

Differential D104870

[SimplifyCFG] Tail-merging all blocks with `unreachable` terminator
AbandonedPublic

Authored by lebedev.ri on Jun 24 2021, 11:42 AM.

Download Raw Diff

Details

Reviewers

rnk
dmgreen
fhahn
nikic
hans
arsenm
davidxl

Summary

Unlike ret/resume, for unreachable currently no such tail-merging
is being performed by clang. There has been at least one previous attempt
at this - D29428.

@rnk noted in D104445, this may be somewhat problematic for certain backends
that "form regions later in the backend", but as per the feedback in D104870
this appears to be a non-problem.

As for the motivation for this, the goal basically is to de-pessimize(*)
the source code that either uses exceptions, and/or is built with assertions,
and/or with ASAN/UBSan instrumentation in warnings-as-errors more.

In all of these cases there's a few function-terminating IR patterns,
but they are repeated over and over again, bloating the function size,
which naturally counts towards inlining cost, which naturally lowers
the chances of the functions compiled under such conditions from being inlined.
(* the insufficient inlining is the pessimization)

As discussed in D101231, while we could technically teach inliner to exempt
such function-terminating blocks from the cost calculation,
we don't really want to do that, because we'd end with even more cold code.
But instead we can achieve much the same goal by decreasing the inline cost
via tail-merging & sinking common code.

All that being said, this is expected to result in *bigger* codesizes,
both because the inliner actually did something now,
and because ultimately the tail duplication in backend might have
undone what we have done here..

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lebedev.ri created this revision.Jun 24 2021, 11:42 AM

Herald added a subscriber: hiraditya. · View Herald TranscriptJun 24 2021, 11:42 AM

lebedev.ri requested review of this revision.Jun 24 2021, 11:42 AM

lebedev.ri added a reviewer: arsenm.Jun 24 2021, 12:25 PM

Herald added a subscriber: wdng. · View Herald TranscriptJun 24 2021, 12:25 PM

Yes, AMDGPU has to deal with this anyway. We have the AMDGPUUnifyDivergentExitNodes patch which essentially does the same thing to merge unreachables

In D104870#2839379, @arsenm wrote:

Yes, AMDGPU has to deal with this anyway. We have the AMDGPUUnifyDivergentExitNodes patch which essentially does the same thing to merge unreachables

@arsenm thank you for commenting!

Harbormaster completed remote builds in B110877: Diff 354321.Jun 24 2021, 12:38 PM

This change deserves some discussion. To address the issues with regions, we shouldn't rely on my hazily remembered understandings, we should consult the experts.

For wasm, let's try @aheejin and @dschuff
For GPUs, let's try @nhaehnle and @arsenm
WinEH isn't an issue, we future proofed it against this transform
For debug info, let's try @aprantl and @jmorse

To those I just added to the review, the question is, will tail merging calls to noreturn functions in IR (BranchFolding already does it in codegen) be an issue for the subsystems you contribute to.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged? Consider this example: https://gcc.godbolt.org/z/ox16a9P1z [[noreturn]] void abort1(); [[noreturn]] void abort2(); [[noreturn]] void abort3(); bool cond(); void doAsserts() { if (cond()) abort1(); if (cond()) abort2(); if (cond()) abort3(); } I think it is more canonical to leave these unreachable terminators in place after the calls to noreturn functions, rather than merging the unreachables together. I just want to make sure your transform isn't firing, creating BBs, and then a later part of simplifycfg rolls the unreachables back up into place after the calls.

lebedev.ri marked an inline comment as done.Jun 24 2021, 1:29 PM

lebedev.ri added inline comments.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged? It fired, we didn't sink anything, and `SimplifyCFGOpt::simplifyUnreachable()` decided to undo it.

rnk added inline comments.Jun 24 2021, 2:08 PM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	Got it, and we want to avoid that because otherwise it will make the overall pass return true to indicate that it changed something, which will make the parent pass manager re-run more passes.

lebedev.ri marked 2 inline comments as done.Jun 24 2021, 2:19 PM

lebedev.ri added inline comments.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	True. As far as i'm aware that only results in potentially invalidating analysises, i'm not aware of that triggering another optimization pass runs. IIRC that part of `SimplifyCFGOpt::simplifyUnreachable()` is a pretty important canonicalization, because e.g. instcombine can't modify cfg.

Given that tail-merging already happens in some scenarios, I don't see any additional debug-info difficulties in doing it for Unreachable, over anything else. (For variable locations and source locations at least).

In D104870#2839468, @rnk wrote:

This change deserves some discussion. To address the issues with regions, we shouldn't rely on my hazily remembered understandings, we should consult the experts.

For wasm, let's try @aheejin and @dschuff

For GPUs, let's try @nhaehnle and @arsenm

WinEH isn't an issue, we future proofed it against this transform

For debug info, let's try @aprantl and @jmorse

To those I just added to the review, the question is, will tail merging calls to noreturn functions in IR (BranchFolding already does it in codegen) be an issue for the subsystems you contribute to.

Thanks for letting me know! I guess WinEH is fine because WinEH makes sure to clone blocks that belong to multiple funclets, right? I think Wasm should be OK as well as WinEH works, because we use WinEHPrepare, so we benefit from this transformation as well.

Thank you @arsenm, @jmorse & @aheejin!
Looks like, as i predicted, all affected parties already have to deal with such IR, so this isn't an issue.

ping @rnk

I think all my high level concerns are addressed then. I think it was worth letting all those folks know about this change. Users will also probably notice this change: it will affect the source location of lots of noreturn calls.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	Even if it only invalidates analyses, I think this is worth addressing before landing this. Ideally this code would directly call the heuristic that "sink from common predecessors" uses, but if that isn't available, I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees. We know that is unprofitable, and that accounts for most blocks ending in unreachable. It saves compile time from IR churn too. @aeubanks, what are the consequences of passes indicating that they changed the IR when they actually didn't?

Thank you for taking a look.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	This is impossible to address. I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees. I can not, because fixing lack of sinking in such cases is basically the very next step here.

rnk added inline comments.Jun 29 2021, 12:58 PM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	I can not, because fixing lack of sinking in such cases is basically the very next step here. This is the transform I'm talking about avoiding, and I don't think we plan to do this in the next step: bb1: call void @abort1() unreachable bb2: call void @abort2() unreachable -> bb1: br label %common bb2: br label %common common: %callee = phi ... @abort1 ... @abort2 call void ... %callee() unreachable Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon. This is impossible to address. I guess what you are saying is that this isn't possible to implement with the current code and data structures. We'd need to incorporate the structure of the instruction before unreachable into the map. This makes me think maybe it would be better to extend the SinkFromCommonPredecessors logic to consider blocks ending in unreachable. The current code is essentially restructuring the CFG in a way that is convenient for that function. I think transforms should avoid changing the IR before they know if transformation is really profitable, and it seems like the profitability heuristic is over there.

lebedev.ri marked an inline comment as done.Jun 29 2021, 1:16 PM

lebedev.ri added inline comments.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon. Right. This isn't going to change. However, consider bb1: call void @abort1() unreachable bb2: call void @abort2() unreachable bb3: call void @abort2() unreachable -> bb1: call void @abort1() unreachable bb2: br label %bb2.bb3.common bb2.bb3.common: call void @abort2() unreachable Also, consider: bb1: call void @abort1() unreachable bb2: call void @abort1() br label %bb3 bb3: unreachable My point being, we can't realistically say that we will/won't succeed in sinking stuff.

aeubanks added inline comments.Jun 29 2021, 7:33 PM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	there's no correctness issue with saying that we modified IR if we didn't actually, it'll just invalidate analyses, causing more work when they are recomputed in later passes might be worth putting this through http://llvm-compile-time-tracker.com/

@aeubanks thank you for taking a look!

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	Since you asked, sure: https://llvm-compile-time-tracker.com/compare.php?from=1f169a774cb865659cefe085e70a56a884e3711e&to=fc54bb9a8ef85bd76dd9e934b2546f4beadc5b5e&stat=instructions I'm not sure what this tells us here. Since the instruction stat correlates with the size changes, i guess we could say that it lead to more inlining, and more IR to chew through. Which is pretty much the expected outcome.

nikic added inline comments.Jun 30 2021, 12:36 AM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points. Did you rerun @rnk's test on clang Release+Assert code size with this patch? It looks like large code size increases are still the blocker for this patch, as they were back then.

lebedev.ri marked 2 inline comments as done.Jun 30 2021, 12:58 AM

lebedev.ri added inline comments.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points. Yep. Did you rerun @rnk's test on clang Release+Assert code size with this patch? I have not because there is no reason to expect that the outcome is different. (It will be somewhat different, because the approach is somewhat different) It looks like large code size increases are still the blocker for this patch, as they were back then. I'm not quite sure how we arrive at this conclusion. Let me make a comparison: when one tries to paint something, it is expected that not only said something will get colored, but the paint amount will use up. What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. I would like to also call-back to the disscussion in D101468, where we had very much the same disscussion, and actually i argued that it was bad, but @nikic argued that said change is good since we no longer overestimate the inlining cost, and i if that leads to an overestimation, then the problem is in inliner. I'm not sure how in this patch the views changed to diametrically opposite ones :)

(@dmgreen you might be interested in also looking at the size numbers on the benchmarks you track)

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	Forgot to mention: `"There are three kinds of lies: lies, damned lies, and statistics."` The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. What we should at least do, is look at how it compares with assert-less code. I don't yet have clang numbers, but here's some for RawSpeed: $ stat --printf="%s %n\n" build-release-*/src/utilities/rsbench/rsbench \| sort 17188264 build-release-new/src/utilities/rsbench/rsbench 17234640 build-release-old/src/utilities/rsbench/rsbench 17464336 build-release-with-asserts-new/src/utilities/rsbench/rsbench 17508840 build-release-with-asserts-old/src/utilities/rsbench/rsbench I.e. `-DNDEBUG`->`-UNDEBUG` is +1.6% increase, while `old`->`new` (i.e. this patch) causes -0.25% decrease. Let me get these numbers for clang...

lebedev.ri added inline comments.Jun 30 2021, 10:50 AM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	And here's clang numbers. $ stat --printf="%s %n\n" build-release-*/bin/clang-13 \| sort 103032240 build-release-old/bin/clang-13 103046136 build-release-new/bin/clang-13 123732704 build-release-with-asserts-old/bin/clang-13 123882984 build-release-with-asserts-new/bin/clang-13 I.e. `-DNDEBUG`->`-UNDEBUG` is +20% increase, while old->new (i.e. this patch) causes +0.1% increase for assert-ful build, and +0.01 for assert-less one. But what this really tells us is that the numbers will vary depending on the underlying libc implementation. The thing is, glibc's `__assert_fail()` has 4 arguments (the stringified assertion, filename, line, function), and in worst-case scenario we'll need a PHI for each one of them, yet currently the profitability check only allows a single PHI. So to reproduce @rnk's numbers, he'd have to redo the test on whatever platform used originally. Another way to spell this, the regression will appear later when profitability check is tuned :)

In D104870#2849503, @lebedev.ri wrote:

(@dmgreen you might be interested in also looking at the size numbers on the benchmarks you track)

There are some small changes at -Oz, but smaller enough to not worry about and not reliably better or worse. The codesize examples I'm running may not have a lot of unreachable terminators, being bare-metal programs. You don't tend to abort when you have no OS to do anything sensible with the abort.

rnk added inline comments.Jun 30 2021, 4:52 PM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. The cost may mostly come from the analysis invalidation, which could be avoided if this were implemented in SinkCodeFromCommonPredecessors. I don't consider this a hard blocker if that is onerous. The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. I think it is more constructive to think of the engagement here as early feedback. People will notice code size increases and provide feedback, it's just a question of when. Reviewers are trying to be helpful. W.r.t. `__assert_fail`, consider that CodeGen tail duplication may ultimately re-duplicate all the calls to `__assert_fail`. If that's the case, we shouldn't do this transform: it would throw away source location information for no gain. The more phis we create, the more likely it is to trigger tail duplication in the backend. However, there are many common calls to `__assert_fail` from things like the llvm::cast template. These are typically inlined and can be tail merged with one phi for the message, the file, line, and function should all be the same. Anyway, are you set on this approach, or would you consider the proposed alternative?

lebedev.ri marked an inline comment as done.Jul 1 2021, 2:57 AM

lebedev.ri added inline comments.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	(Note that the compile time numbers aren't final, because this lacks further profitability checks relaxation in sinking logic.) My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. FWIW i do agree that this will obviously affect the compile time. The cost may mostly come from the analysis invalidation, The avoidable portion of the cost. The cost that comes from succeeding in the transformation, and enabling whatever next transformation will remain. I think i should mention that i believe that for any non-trivial function the simplifycfg will likely already report that it changed things, and for trivial things it shouldn't be too costly to recompute analysis. This may be a faulty view, but that is what i think. The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. I think it is more constructive to think of the engagement here as early feedback. People will notice code size > increases and provide feedback, it's just a question of when. Reviewers are trying to be helpful. Right, i agree, though it didn't quite read as such to me, but it may be again just that the tone doesn't quite always roundtrip perfectly through translation. which could be avoided if this were implemented in SinkCodeFromCommonPredecessors. I don't consider this a hard blocker if that is onerous. <...> Anyway, are you set on this approach, or would you consider the proposed alternative? Hmm, wait, i lost the thought. What were implemented in where? The profitability check before tail-merging? Or not doing tail-merging eagerly/early, but instead when visiting a function-terminating block, scan the function for all the blocks with the same terminator and only tail-merge iff we can actually sink? The latter sounds like it will result in some quadratic behavior.

the description doesn't state why this is desirable
are there performance improvements? I would initially think that this is for code size, but the conversation indicates otherwise

nikic added inline comments.Jul 1 2021, 9:52 AM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. Just to double check: Has someone confirmed that the code size increases we see are indeed caused (or caused primarily) by the inlining interaction? I would like to also call-back to the disscussion in D101468, where we had very much the same disscussion, and actually i argued that it was bad, but @nikic argued that said change is good since we no longer overestimate the inlining cost, and i if that leads to an overestimation, then the problem is in inliner. I'm not sure how in this patch the views changed to diametrically opposite ones :) My view here hasn't really changed in principle, but the practical aspects here are quite different. D101468 had some code size wins, some losses, and an overall 0.1% regression. Here we see many regressions in the 1-10% range. D101468 was all of "the right thing to do", had a clear motivation for vectorization and had fairly limited code size impact. For this patch, it seems like "the right thing to do" on an abstract level (in the sense that we do the same thing for other terminators), but it also has a non-trivial code size impact and the overall motivation isn't clear to me. Or at least, the patch summary doesn't say what the larger motivation here is. I would have guessed that the motivation is to reduce code size by tail merging and sinking, but if in practice the reverse happens due to inlining interaction, then I'm not sure that really makes sense. Maybe all I'm really looking for is clarification on what the motivation / bigger picture here is? My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. @rnk: The way to read the numbers is: Anything colored is likely not noise. For most benchmarks, the noise floor is <0.05%.

Tried to update the description to talk about the motivation.

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. Just to double check: Has someone confirmed that the code size increases we see are indeed caused (or caused primarily) by the inlining interaction? I have just compared the stats for the vanilla test-suite as a whole, and 7zip specifically: while there is an increase in assembly instruction count, and increase in tail duplication, there is an increase in the count of IR instructions and decrease of the function/block count at the end of middle-end pipeline, and finally more `inline.NumInlined`. So i'm going to go with "yes".

are there any benchmarks (public or not) that show benefits with this patch?

lebedev.ri mentioned this in D105363: [InstCombine] Transitively propagate `unreachable` into predecessors.Jul 3 2021, 1:24 AM

lebedev.ri mentioned this in D116692: [SimplifyCFG] Tail-merging all blocks with `unreachable` terminator, final take.Jan 5 2022, 1:13 PM

lebedev.ri added inline comments.Jan 5 2022, 1:17 PM

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll
129–130	@rnk i've finally implemented "lossless" variant that we have disscussed here in https://reviews.llvm.org/D116692 I think it's rather ugly and unprecedented, but let's see what https://llvm-compile-time-tracker.com/compare.php?from=2353e1c87b09c20e75f0f3ceb05fa4a4261fe3dd&to=bed7b8df4565f4503889a19235e853b985ca3481&stat=instructions says...

lebedev.ri abandoned this revision.Oct 18 2022, 5:46 PM

Herald added a project: Restricted Project. · View Herald TranscriptOct 18 2022, 5:46 PM

nikic mentioned this in D140605: Support unreachable instructions in SimplifyCFG's tail merging..Dec 23 2022, 12:23 AM

Revision Contents

Path

Size

llvm/

lib/

Transforms/

Scalar/

SimplifyCFGPass.cpp

1 line

test/

CodeGen/

Thumb2/

setjmp_longjmp.ll

59 lines

Transforms/

SimplifyCFG/

tail-merge-noreturn.ll

150 lines

Diff 354321

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

Show First 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	for (BasicBlock &BB : F) {

auto *Term = BB.getTerminator();		auto *Term = BB.getTerminator();

// Fow now only support `ret`/`resume` function terminators.		// Fow now only support `ret`/`resume` function terminators.
// FIXME: lift this restriction.		// FIXME: lift this restriction.
switch (Term->getOpcode()) {		switch (Term->getOpcode()) {
case Instruction::Ret:		case Instruction::Ret:
case Instruction::Resume:		case Instruction::Resume:
		case Instruction::Unreachable:
break;		break;
default:		default:
continue;		continue;
}		}

// We can't tail-merge block that contains a musttail call.		// We can't tail-merge block that contains a musttail call.
if (BB.getTerminatingMustTailCall())		if (BB.getTerminatingMustTailCall())
continue;		continue;
▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

llvm/test/CodeGen/Thumb2/setjmp_longjmp.ll

	Show All 19 Lines
	; CHECK-NEXT: str.w sp, [sp, #12]			; CHECK-NEXT: str.w sp, [sp, #12]
	; CHECK-NEXT: mov r1, pc @ eh_setjmp begin			; CHECK-NEXT: mov r1, pc @ eh_setjmp begin
	; CHECK-NEXT: adds r1, r1, #7			; CHECK-NEXT: adds r1, r1, #7
	; CHECK-NEXT: str r1, [r0, #4]			; CHECK-NEXT: str r1, [r0, #4]
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: b LSJLJEH0			; CHECK-NEXT: b LSJLJEH0
	; CHECK-NEXT: movs r0, #1 @ eh_setjmp end			; CHECK-NEXT: movs r0, #1 @ eh_setjmp end
	; CHECK-NEXT: LSJLJEH0:			; CHECK-NEXT: LSJLJEH0:
	; CHECK-NEXT: cbz r0, LBB0_3			; CHECK-NEXT: movw r1, :lower16:(L_g$non_lazy_ptr-(LPC0_0+4))
	; CHECK-NEXT: @ %bb.1: @ %if.then			; CHECK-NEXT: movt r1, :upper16:(L_g$non_lazy_ptr-(LPC0_0+4))
	; CHECK-NEXT: movw r0, :lower16:(L_g$non_lazy_ptr-(LPC0_0+4))
	; CHECK-NEXT: movt r0, :upper16:(L_g$non_lazy_ptr-(LPC0_0+4))
	; CHECK-NEXT: LPC0_0:			; CHECK-NEXT: LPC0_0:
	; CHECK-NEXT: add r0, pc			; CHECK-NEXT: add r1, pc
	; CHECK-NEXT: ldr r1, [r0]			; CHECK-NEXT: cbz r0, LBB0_4
				; CHECK-NEXT: @ %bb.1: @ %if.then
				; CHECK-NEXT: ldr r2, [r1]
	; CHECK-NEXT: movs r0, #1			; CHECK-NEXT: movs r0, #1
	; CHECK-NEXT: str r1, [sp] @ 4-byte Spill			; CHECK-NEXT: str r2, [sp] @ 4-byte Spill
	; CHECK-NEXT: str r0, [r1]
	; CHECK-NEXT: add r0, sp, #4
	; CHECK-NEXT: movs r1, #0			; CHECK-NEXT: movs r1, #0
				; CHECK-NEXT: str r0, [r2]
				; CHECK-NEXT: add r0, sp, #4
	; CHECK-NEXT: str r7, [sp, #4]			; CHECK-NEXT: str r7, [sp, #4]
	; CHECK-NEXT: str.w sp, [sp, #12]			; CHECK-NEXT: str.w sp, [sp, #12]
	; CHECK-NEXT: mov r1, pc @ eh_setjmp begin			; CHECK-NEXT: mov r1, pc @ eh_setjmp begin
	; CHECK-NEXT: adds r1, r1, #7			; CHECK-NEXT: adds r1, r1, #7
	; CHECK-NEXT: str r1, [r0, #4]			; CHECK-NEXT: str r1, [r0, #4]
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: b LSJLJEH1			; CHECK-NEXT: b LSJLJEH1
	; CHECK-NEXT: movs r0, #1 @ eh_setjmp end			; CHECK-NEXT: movs r0, #1 @ eh_setjmp end
	; CHECK-NEXT: LSJLJEH1:			; CHECK-NEXT: LSJLJEH1:
	; CHECK-NEXT: cmp r0, #0			; CHECK-NEXT: cmp r0, #0
	; CHECK-NEXT: itttt ne			; CHECK-NEXT: itttt ne
	; CHECK-NEXT: movne r0, #3			; CHECK-NEXT: movne r0, #3
	; CHECK-NEXT: ldrne r1, [sp] @ 4-byte Reload			; CHECK-NEXT: ldrne r1, [sp] @ 4-byte Reload
	; CHECK-NEXT: strne r0, [r1]			; CHECK-NEXT: strne r0, [r1]
	; CHECK-NEXT: addne sp, #24			; CHECK-NEXT: addne sp, #24
	; CHECK-NEXT: it ne			; CHECK-NEXT: it ne
	; CHECK-NEXT: popne.w {r4, r5, r6, r7, r8, r10, r11, pc}			; CHECK-NEXT: popne.w {r4, r5, r6, r7, r8, r10, r11, pc}
	; CHECK-NEXT: LBB0_2: @ %if2.else			; CHECK-NEXT: LBB0_2:
	; CHECK-NEXT: ldr r1, [sp] @ 4-byte Reload			; CHECK-NEXT: movw r1, :lower16:(L_g$non_lazy_ptr-(LPC0_1+4))
				; CHECK-NEXT: add r2, sp, #4
				; CHECK-NEXT: movt r1, :upper16:(L_g$non_lazy_ptr-(LPC0_1+4))
	; CHECK-NEXT: movs r0, #2			; CHECK-NEXT: movs r0, #2
				; CHECK-NEXT: LPC0_1:
				; CHECK-NEXT: add r1, pc
				; CHECK-NEXT: LBB0_3: @ %common.unreachable
				; CHECK-NEXT: ldr r1, [r1]
	; CHECK-NEXT: str r0, [r1]			; CHECK-NEXT: str r0, [r1]
	; CHECK-NEXT: add r1, sp, #4
	; CHECK-NEXT: movs r0, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: ldr r0, [r1, #8]			; CHECK-NEXT: ldr r0, [r2, #8]
	; CHECK-NEXT: mov sp, r0			; CHECK-NEXT: mov sp, r0
	; CHECK-NEXT: ldr r0, [r1, #4]			; CHECK-NEXT: ldr r0, [r2, #4]
	; CHECK-NEXT: ldr r7, [r1]			; CHECK-NEXT: ldr r7, [r2]
	; CHECK-NEXT: bx r0			; CHECK-NEXT: bx r0
	; CHECK-NEXT: LBB0_3: @ %if.else			; CHECK-NEXT: LBB0_4:
	; CHECK-NEXT: movw r0, :lower16:(L_g$non_lazy_ptr-(LPC0_1+4))			; CHECK-NEXT: add r2, sp, #4
	; CHECK-NEXT: movs r1, #0			; CHECK-NEXT: movs r0, #0
	; CHECK-NEXT: movt r0, :upper16:(L_g$non_lazy_ptr-(LPC0_1+4))			; CHECK-NEXT: b LBB0_3
	; CHECK-NEXT: LPC0_1:
	; CHECK-NEXT: add r0, pc
	; CHECK-NEXT: ldr r0, [r0]
	; CHECK-NEXT: str r1, [r0]
	; CHECK-NEXT: add r0, sp, #4
	; CHECK-NEXT: ldr r1, [r0, #8]
	; CHECK-NEXT: mov sp, r1
	; CHECK-NEXT: ldr r1, [r0, #4]
	; CHECK-NEXT: ldr r7, [r0]
	; CHECK-NEXT: bx r1
	entry:			entry:
	%buf = alloca [5 x i8*], align 4			%buf = alloca [5 x i8*], align 4
	%bufptr = bitcast [5 x i8] %buf to i8*			%bufptr = bitcast [5 x i8] %buf to i8*
	%arraydecay = getelementptr inbounds [5 x i8], [5 x i8]* %buf, i32 0, i32 0			%arraydecay = getelementptr inbounds [5 x i8], [5 x i8]* %buf, i32 0, i32 0

	%fa = tail call i8* @llvm.frameaddress(i32 0)			%fa = tail call i8* @llvm.frameaddress(i32 0)
	store i8* %fa, i8** %arraydecay, align 4			store i8* %fa, i8** %arraydecay, align 4
	%ss = tail call i8* @llvm.stacksave()			%ss = tail call i8* @llvm.stacksave()
	Show All 38 Lines

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py		; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
; RUN: opt -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -sink-common-insts -S < %s \| FileCheck %s		; RUN: opt -simplifycfg -simplifycfg-require-and-preserve-domtree=1 -sink-common-insts -S < %s \| FileCheck %s

; Test that we tail merge noreturn call blocks and phi constants properly.		; Test that we tail merge noreturn call blocks and phi constants properly.

declare void @abort()		declare void @abort()
declare void @assert_fail_1(i32)		declare void @assert_fail_1(i32)
declare void @assert_fail_1_alt(i32)		declare void @assert_fail_1_alt(i32)

define void @merge_simple() {		define void @merge_simple() {
; CHECK-LABEL: @merge_simple(		; CHECK-LABEL: @merge_simple(
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[A1:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[COMMON_UNREACHABLE:%.]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: cont1:		; CHECK: cont1:
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: a2:
; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.]], label [[A3:%.]]		; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: a3:
; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable
; CHECK: cont3:		; CHECK: cont3:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
a1:		a1:
call void @assert_fail_1(i32 0)		call void @assert_fail_1(i32 0)
unreachable		unreachable
Show All 12 Lines
cont3:		cont3:
ret void		ret void
}		}

define void @phi_three_constants() {		define void @phi_three_constants() {
; CHECK-LABEL: @phi_three_constants(		; CHECK-LABEL: @phi_three_constants(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[A1:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[COMMON_UNREACHABLE:%.]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: [[DOTSINK:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[CONT1]] ], [ 2, [[CONT2:%.*]] ]
		; CHECK-NEXT: call void @assert_fail_1(i32 [[DOTSINK]])
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: cont1:		; CHECK: cont1:
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2]], label [[COMMON_UNREACHABLE]]
; CHECK: a2:
; CHECK-NEXT: call void @assert_fail_1(i32 1)
; CHECK-NEXT: unreachable
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.]], label [[A3:%.]]		; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: a3:
; CHECK-NEXT: call void @assert_fail_1(i32 2)
; CHECK-NEXT: unreachable
; CHECK: cont3:		; CHECK: cont3:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
a1:		a1:
call void @assert_fail_1(i32 0)		call void @assert_fail_1(i32 0)
Show All 12 Lines	a3:
unreachable		unreachable
cont3:		cont3:
ret void		ret void
}		}

define void @dont_phi_values(i32 %x, i32 %y) {		define void @dont_phi_values(i32 %x, i32 %y) {
; CHECK-LABEL: @dont_phi_values(		; CHECK-LABEL: @dont_phi_values(
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[A1:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[COMMON_UNREACHABLE:%.]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @assert_fail_1(i32 [[X:%.*]])		; CHECK-NEXT: [[Y_SINK:%.]] = phi i32 [ [[X:%.]], [[TMP0:%.]] ], [ [[Y:%.]], [[CONT1]] ]
		; CHECK-NEXT: call void @assert_fail_1(i32 [[Y_SINK]])
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: cont1:		; CHECK: cont1:
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: a2:
; CHECK-NEXT: call void @assert_fail_1(i32 [[Y:%.*]])
; CHECK-NEXT: unreachable
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
a1:		a1:
call void @assert_fail_1(i32 %x)		call void @assert_fail_1(i32 %x)
unreachable		unreachable
Show All 15 Lines
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
; CHECK: a1:		; CHECK: a1:
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: a2:		; CHECK: a2:
; CHECK-NEXT: call void @assert_fail_1_alt(i32 0)		; CHECK-NEXT: call void @assert_fail_1_alt(i32 0)
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
		rnkUnsubmitted Done Reply Inline Actions I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged? Consider this example: https://gcc.godbolt.org/z/ox16a9P1z [[noreturn]] void abort1(); [[noreturn]] void abort2(); [[noreturn]] void abort3(); bool cond(); void doAsserts() { if (cond()) abort1(); if (cond()) abort2(); if (cond()) abort3(); } I think it is more canonical to leave these unreachable terminators in place after the calls to noreturn functions, rather than merging the unreachables together. I just want to make sure your transform isn't firing, creating BBs, and then a later part of simplifycfg rolls the unreachables back up into place after the calls. rnk: I expected your code to fire on this test case. Can you explain why this example isn't getting…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged? It fired, we didn't sink anything, and `SimplifyCFGOpt::simplifyUnreachable()` decided to undo it. lebedev.ri: > I expected your code to fire on this test case. Can you explain why this example isn't…
		rnkUnsubmitted Done Reply Inline Actions Got it, and we want to avoid that because otherwise it will make the overall pass return true to indicate that it changed something, which will make the parent pass manager re-run more passes. rnk: Got it, and we want to avoid that because otherwise it will make the overall pass return true…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions True. As far as i'm aware that only results in potentially invalidating analysises, i'm not aware of that triggering another optimization pass runs. IIRC that part of `SimplifyCFGOpt::simplifyUnreachable()` is a pretty important canonicalization, because e.g. instcombine can't modify cfg. lebedev.ri: True. As far as i'm aware that only results in potentially invalidating analysises, i'm not…
		rnkUnsubmitted Done Reply Inline Actions Even if it only invalidates analyses, I think this is worth addressing before landing this. Ideally this code would directly call the heuristic that "sink from common predecessors" uses, but if that isn't available, I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees. We know that is unprofitable, and that accounts for most blocks ending in unreachable. It saves compile time from IR churn too. @aeubanks, what are the consequences of passes indicating that they changed the IR when they actually didn't? rnk: Even if it only invalidates analyses, I think this is worth addressing before landing this.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions This is impossible to address. I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees. I can not, because fixing lack of sinking in such cases is basically the very next step here. lebedev.ri: This is impossible to address. > I think you could approximate it by not merging unreachable…
		rnkUnsubmitted Not Done Reply Inline Actions I can not, because fixing lack of sinking in such cases is basically the very next step here. This is the transform I'm talking about avoiding, and I don't think we plan to do this in the next step: bb1: call void @abort1() unreachable bb2: call void @abort2() unreachable -> bb1: br label %common bb2: br label %common common: %callee = phi ... @abort1 ... @abort2 call void ... %callee() unreachable Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon. This is impossible to address. I guess what you are saying is that this isn't possible to implement with the current code and data structures. We'd need to incorporate the structure of the instruction before unreachable into the map. This makes me think maybe it would be better to extend the SinkFromCommonPredecessors logic to consider blocks ending in unreachable. The current code is essentially restructuring the CFG in a way that is convenient for that function. I think transforms should avoid changing the IR before they know if transformation is really profitable, and it seems like the profitability heuristic is over there. rnk: > I can not, because fixing lack of sinking in such cases is basically the very next step here.
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon. Right. This isn't going to change. However, consider bb1: call void @abort1() unreachable bb2: call void @abort2() unreachable bb3: call void @abort2() unreachable -> bb1: call void @abort1() unreachable bb2: br label %bb2.bb3.common bb2.bb3.common: call void @abort2() unreachable Also, consider: bb1: call void @abort1() unreachable bb2: call void @abort1() br label %bb3 bb3: unreachable My point being, we can't realistically say that we will/won't succeed in sinking stuff. lebedev.ri: > Right? This would make direct calls indirect, which is less canonical. I doubt this is going…
		aeubanksUnsubmitted Done Reply Inline Actions there's no correctness issue with saying that we modified IR if we didn't actually, it'll just invalidate analyses, causing more work when they are recomputed in later passes might be worth putting this through http://llvm-compile-time-tracker.com/ aeubanks: there's no correctness issue with saying that we modified IR if we didn't actually, it'll just…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Since you asked, sure: https://llvm-compile-time-tracker.com/compare.php?from=1f169a774cb865659cefe085e70a56a884e3711e&to=fc54bb9a8ef85bd76dd9e934b2546f4beadc5b5e&stat=instructions I'm not sure what this tells us here. Since the instruction stat correlates with the size changes, i guess we could say that it lead to more inlining, and more IR to chew through. Which is pretty much the expected outcome. lebedev.ri: Since you asked, sure: https://llvm-compile-time-tracker.com/compare.php?
		nikicUnsubmitted Done Reply Inline Actions This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points. Did you rerun @rnk's test on clang Release+Assert code size with this patch? It looks like large code size increases are still the blocker for this patch, as they were back then. nikic: This shows a 10% increase in code size on mafft with LTO and a few others also increase by…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points. Yep. Did you rerun @rnk's test on clang Release+Assert code size with this patch? I have not because there is no reason to expect that the outcome is different. (It will be somewhat different, because the approach is somewhat different) It looks like large code size increases are still the blocker for this patch, as they were back then. I'm not quite sure how we arrive at this conclusion. Let me make a comparison: when one tries to paint something, it is expected that not only said something will get colored, but the paint amount will use up. What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. I would like to also call-back to the disscussion in D101468, where we had very much the same disscussion, and actually i argued that it was bad, but @nikic argued that said change is good since we no longer overestimate the inlining cost, and i if that leads to an overestimation, then the problem is in inliner. I'm not sure how in this patch the views changed to diametrically opposite ones :) lebedev.ri: > This shows a 10% increase in code size on mafft with LTO and a few others also increase by…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions Forgot to mention: `"There are three kinds of lies: lies, damned lies, and statistics."` The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. What we should at least do, is look at how it compares with assert-less code. I don't yet have clang numbers, but here's some for RawSpeed: $ stat --printf="%s %n\n" build-release-/src/utilities/rsbench/rsbench \| sort 17188264 build-release-new/src/utilities/rsbench/rsbench 17234640 build-release-old/src/utilities/rsbench/rsbench 17464336 build-release-with-asserts-new/src/utilities/rsbench/rsbench 17508840 build-release-with-asserts-old/src/utilities/rsbench/rsbench I.e. `-DNDEBUG`->`-UNDEBUG` is +1.6% increase, while `old`->`new` (i.e. this patch) causes -0.25% decrease. Let me get these numbers for clang... lebedev.ri:* Forgot to mention: `"There are three kinds of lies: lies, damned lies, and statistics."` The n%…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions And here's clang numbers. $ stat --printf="%s %n\n" build-release-/bin/clang-13 \| sort 103032240 build-release-old/bin/clang-13 103046136 build-release-new/bin/clang-13 123732704 build-release-with-asserts-old/bin/clang-13 123882984 build-release-with-asserts-new/bin/clang-13 I.e. `-DNDEBUG`->`-UNDEBUG` is +20% increase, while old->new (i.e. this patch) causes +0.1% increase for assert-ful build, and +0.01 for assert-less one. But what this really tells us is that the numbers will vary depending on the underlying libc implementation. The thing is, glibc's `__assert_fail()` has 4 arguments (the stringified assertion, filename, line, function), and in worst-case scenario we'll need a PHI for each one of them, yet currently the profitability check only allows a single PHI. So to reproduce @rnk's numbers, he'd have to redo the test on whatever platform used originally. Another way to spell this, the regression will appear later when profitability check is tuned :) lebedev.ri:* And here's clang numbers. ``` $ stat --printf="%s %n\n" build-release-*/bin/clang-13 \| sort…
		rnkUnsubmitted Done Reply Inline Actions My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. The cost may mostly come from the analysis invalidation, which could be avoided if this were implemented in SinkCodeFromCommonPredecessors. I don't consider this a hard blocker if that is onerous. The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. I think it is more constructive to think of the engagement here as early feedback. People will notice code size increases and provide feedback, it's just a question of when. Reviewers are trying to be helpful. W.r.t. `__assert_fail`, consider that CodeGen tail duplication may ultimately re-duplicate all the calls to `__assert_fail`. If that's the case, we shouldn't do this transform: it would throw away source location information for no gain. The more phis we create, the more likely it is to trigger tail duplication in the backend. However, there are many common calls to `__assert_fail` from things like the llvm::cast template. These are typically inlined and can be tail merged with one phi for the message, the file, line, and function should all be the same. Anyway, are you set on this approach, or would you consider the proposed alternative? rnk: My reading of the llvm compile time tracker results is that this patch may result in a modest…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions (Note that the compile time numbers aren't final, because this lacks further profitability checks relaxation in sinking logic.) My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. FWIW i do agree that this will obviously affect the compile time. The cost may mostly come from the analysis invalidation, The avoidable portion of the cost. The cost that comes from succeeding in the transformation, and enabling whatever next transformation will remain. I think i should mention that i believe that for any non-trivial function the simplifycfg will likely already report that it changed things, and for trivial things it shouldn't be too costly to recompute analysis. This may be a faulty view, but that is what i think. The n% increase in code size is a pretty meaningless number, and i'm sad to see it being used in such a harsh blocking manner. I think it is more constructive to think of the engagement here as early feedback. People will notice code size > increases and provide feedback, it's just a question of when. Reviewers are trying to be helpful. Right, i agree, though it didn't quite read as such to me, but it may be again just that the tone doesn't quite always roundtrip perfectly through translation. which could be avoided if this were implemented in SinkCodeFromCommonPredecessors. I don't consider this a hard blocker if that is onerous. <...> Anyway, are you set on this approach, or would you consider the proposed alternative? Hmm, wait, i lost the thought. What were implemented in where? The profitability check before tail-merging? Or not doing tail-merging eagerly/early, but instead when visiting a function-terminating block, scan the function for all the blocks with the same terminator and only tail-merge iff we can actually sink? The latter sounds like it will result in some quadratic behavior. lebedev.ri: (Note that the compile time numbers aren't final, because this lacks further profitability…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions @rnk i've finally implemented "lossless" variant that we have disscussed here in https://reviews.llvm.org/D116692 I think it's rather ugly and unprecedented, but let's see what https://llvm-compile-time-tracker.com/compare.php?from=2353e1c87b09c20e75f0f3ceb05fa4a4261fe3dd&to=bed7b8df4565f4503889a19235e853b985ca3481&stat=instructions says... lebedev.ri: @rnk i've finally implemented "lossless" variant that we have disscussed here in https…
		nikicUnsubmitted Not Done Reply Inline Actions What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. Just to double check: Has someone confirmed that the code size increases we see are indeed caused (or caused primarily) by the inlining interaction? I would like to also call-back to the disscussion in D101468, where we had very much the same disscussion, and actually i argued that it was bad, but @nikic argued that said change is good since we no longer overestimate the inlining cost, and i if that leads to an overestimation, then the problem is in inliner. I'm not sure how in this patch the views changed to diametrically opposite ones :) My view here hasn't really changed in principle, but the practical aspects here are quite different. D101468 had some code size wins, some losses, and an overall 0.1% regression. Here we see many regressions in the 1-10% range. D101468 was all of "the right thing to do", had a clear motivation for vectorization and had fairly limited code size impact. For this patch, it seems like "the right thing to do" on an abstract level (in the sense that we do the same thing for other terminators), but it also has a non-trivial code size impact and the overall motivation isn't clear to me. Or at least, the patch summary doesn't say what the larger motivation here is. I would have guessed that the motivation is to reduce code size by tail merging and sinking, but if in practice the reverse happens due to inlining interaction, then I'm not sure that really makes sense. Maybe all I'm really looking for is clarification on what the motivation / bigger picture here is? My reading of the llvm compile time tracker results is that this patch may result in a modest (~1% or less) compile time increase. That could be in the noise. @rnk: The way to read the numbers is: Anything colored is likely not noise. For most benchmarks, the noise floor is <0.05%. nikic: > What i'm saying is that the effect this has is not unexpected, on the contrary, it is…
		lebedev.riAuthorUnsubmitted Done Reply Inline Actions What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected. We successfully decrease the amount of IR bloat by assertion blocks, decreasing the size of the functions they are in, and naturally that makes some of them more eligible for inlining. which happens, and increases codesize. Just to double check: Has someone confirmed that the code size increases we see are indeed caused (or caused primarily) by the inlining interaction? I have just compared the stats for the vanilla test-suite as a whole, and 7zip specifically: while there is an increase in assembly instruction count, and increase in tail duplication, there is an increase in the count of IR instructions and decrease of the function/block count at the end of middle-end pipeline, and finally more `inline.NumInlined`. So i'm going to go with "yes". lebedev.ri: > > What i'm saying is that the effect this has is not unexpected, on the contrary, it is…
;		;
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
cont1:		cont1:
%c2 = call i1 @foo()		%c2 = call i1 @foo()
br i1 %c2, label %cont2, label %a2		br i1 %c2, label %cont2, label %a2
cont2:		cont2:
ret void		ret void
Show All 9 Lines
declare i1 @bar()		declare i1 @bar()

define void @unmergeable_phis(i32 %v, i1 %c) {		define void @unmergeable_phis(i32 %v, i1 %c) {
; CHECK-LABEL: @unmergeable_phis(		; CHECK-LABEL: @unmergeable_phis(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: br i1 [[C:%.]], label [[S1:%.]], label [[S2:%.*]]		; CHECK-NEXT: br i1 [[C:%.]], label [[S1:%.]], label [[S2:%.*]]
; CHECK: s1:		; CHECK: s1:
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[A1:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[COMMON_UNREACHABLE:%.]], label [[A2:%.]]
; CHECK: s2:		; CHECK: s2:
; CHECK-NEXT: [[C2:%.*]] = call i1 @bar()		; CHECK-NEXT: [[C2:%.*]] = call i1 @bar()
; CHECK-NEXT: br i1 [[C2]], label [[A1]], label [[A2]]		; CHECK-NEXT: br i1 [[C2]], label [[COMMON_UNREACHABLE]], label [[A2]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: [[L1:%.*]] = phi i32 [ 0, [[S1]] ], [ 1, [[S2]] ]		; CHECK-NEXT: [[L2_SINK:%.]] = phi i32 [ [[L2:%.]], [[A2]] ], [ 0, [[S1]] ], [ 1, [[S2]] ]
; CHECK-NEXT: call void @assert_fail_1(i32 [[L1]])		; CHECK-NEXT: call void @assert_fail_1(i32 [[L2_SINK]])
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: a2:		; CHECK: a2:
; CHECK-NEXT: [[L2:%.*]] = phi i32 [ 2, [[S1]] ], [ 3, [[S2]] ]		; CHECK-NEXT: [[L2]] = phi i32 [ 2, [[S1]] ], [ 3, [[S2]] ]
; CHECK-NEXT: call void @assert_fail_1(i32 [[L2]])		; CHECK-NEXT: br label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: unreachable
;		;
entry:		entry:
br i1 %c, label %s1, label %s2		br i1 %c, label %s1, label %s2
s1:		s1:
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %a1, label %a2		br i1 %c1, label %a1, label %a2
s2:		s2:
%c2 = call i1 @bar()		%c2 = call i1 @bar()
br i1 %c2, label %a1, label %a2		br i1 %c2, label %a1, label %a2
a1:		a1:
%l1 = phi i32 [ 0, %s1 ], [ 1, %s2 ]		%l1 = phi i32 [ 0, %s1 ], [ 1, %s2 ]
call void @assert_fail_1(i32 %l1)		call void @assert_fail_1(i32 %l1)
unreachable		unreachable
a2:		a2:
%l2 = phi i32 [ 2, %s1 ], [ 3, %s2 ]		%l2 = phi i32 [ 2, %s1 ], [ 3, %s2 ]
call void @assert_fail_1(i32 %l2)		call void @assert_fail_1(i32 %l2)
unreachable		unreachable
}		}

define void @tail_merge_switch(i32 %v) {		define void @tail_merge_switch(i32 %v) {
; CHECK-LABEL: @tail_merge_switch(		; CHECK-LABEL: @tail_merge_switch(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: switch i32 [[V:%.]], label [[RET:%.]] [		; CHECK-NEXT: switch i32 [[V:%.]], label [[RET:%.]] [
; CHECK-NEXT: i32 0, label [[A1:%.*]]		; CHECK-NEXT: i32 0, label [[COMMON_UNREACHABLE:%.*]]
; CHECK-NEXT: i32 13, label [[A2:%.*]]		; CHECK-NEXT: i32 13, label [[A2:%.*]]
; CHECK-NEXT: i32 42, label [[A3:%.*]]		; CHECK-NEXT: i32 42, label [[A3:%.*]]
; CHECK-NEXT: ]		; CHECK-NEXT: ]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: [[DOTSINK:%.]] = phi i32 [ 2, [[A3]] ], [ 1, [[A2]] ], [ 0, [[ENTRY:%.]] ]
		; CHECK-NEXT: call void @assert_fail_1(i32 [[DOTSINK]])
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: a2:		; CHECK: a2:
; CHECK-NEXT: call void @assert_fail_1(i32 1)		; CHECK-NEXT: br label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: unreachable
; CHECK: a3:		; CHECK: a3:
; CHECK-NEXT: call void @assert_fail_1(i32 2)		; CHECK-NEXT: br label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: unreachable
; CHECK: ret:		; CHECK: ret:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
switch i32 %v, label %ret [		switch i32 %v, label %ret [
i32 0, label %a1		i32 0, label %a1
i32 13, label %a2		i32 13, label %a2
i32 42, label %a3		i32 42, label %a3
Show All 9 Lines	a3:
unreachable		unreachable
ret:		ret:
ret void		ret void
}		}

define void @need_to_add_bb2_preds(i1 %c1) {		define void @need_to_add_bb2_preds(i1 %c1) {
; CHECK-LABEL: @need_to_add_bb2_preds(		; CHECK-LABEL: @need_to_add_bb2_preds(
; CHECK-NEXT: bb1:		; CHECK-NEXT: bb1:
; CHECK-NEXT: br i1 [[C1:%.]], label [[BB2:%.]], label [[A1:%.*]]		; CHECK-NEXT: br i1 [[C1:%.]], label [[BB2:%.]], label [[COMMON_UNREACHABLE:%.*]]
; CHECK: bb2:		; CHECK: bb2:
; CHECK-NEXT: [[C2:%.*]] = call i1 @bar()		; CHECK-NEXT: [[C2:%.*]] = call i1 @bar()
; CHECK-NEXT: br i1 [[C2]], label [[A2:%.]], label [[A3:%.]]		; CHECK-NEXT: [[SPEC_SELECT:%.*]] = select i1 [[C2]], i32 1, i32 2
; CHECK: a1:		; CHECK-NEXT: br label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK: common.unreachable:
; CHECK-NEXT: unreachable		; CHECK-NEXT: [[DOTSINK:%.]] = phi i32 [ 0, [[BB1:%.]] ], [ [[SPEC_SELECT]], [[BB2]] ]
; CHECK: a2:		; CHECK-NEXT: call void @assert_fail_1(i32 [[DOTSINK]])
; CHECK-NEXT: call void @assert_fail_1(i32 1)
; CHECK-NEXT: unreachable
; CHECK: a3:
; CHECK-NEXT: call void @assert_fail_1(i32 2)
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
;		;
bb1:		bb1:
br i1 %c1, label %bb2, label %a1		br i1 %c1, label %bb2, label %a1
bb2:		bb2:
%c2 = call i1 @bar()		%c2 = call i1 @bar()
br i1 %c2, label %a2, label %a3		br i1 %c2, label %a2, label %a3

a1:		a1:
call void @assert_fail_1(i32 0)		call void @assert_fail_1(i32 0)
unreachable		unreachable
a2:		a2:
call void @assert_fail_1(i32 1)		call void @assert_fail_1(i32 1)
unreachable		unreachable
a3:		a3:
call void @assert_fail_1(i32 2)		call void @assert_fail_1(i32 2)
unreachable		unreachable
}		}

define void @phi_in_bb2() {		define void @phi_in_bb2() {
; CHECK-LABEL: @phi_in_bb2(		; CHECK-LABEL: @phi_in_bb2(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[A1:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[COMMON_UNREACHABLE:%.]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: [[P2_SINK:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[CONT1]] ], [ 2, [[CONT2:%.*]] ]
		; CHECK-NEXT: call void @assert_fail_1(i32 [[P2_SINK]])
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: cont1:		; CHECK: cont1:
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.]], label [[A2:%.]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2]], label [[COMMON_UNREACHABLE]]
; CHECK: a2:
; CHECK-NEXT: [[P2:%.*]] = phi i32 [ 1, [[CONT1]] ], [ 2, [[CONT2]] ]
; CHECK-NEXT: call void @assert_fail_1(i32 [[P2]])
; CHECK-NEXT: unreachable
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.*]], label [[A2]]		; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: cont3:		; CHECK: cont3:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
a1:		a1:
call void @assert_fail_1(i32 0)		call void @assert_fail_1(i32 0)
Show All 32 Lines
; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[X]] to i8*		; CHECK-NEXT: [[TMP0:%.]] = bitcast i32 [[X]] to i8*
; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]])		; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP0]])
; CHECK-NEXT: store i32 0, i32* [[X]], align 4		; CHECK-NEXT: store i32 0, i32* [[X]], align 4
; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[C2:%.]], 0		; CHECK-NEXT: [[TOBOOL:%.]] = icmp eq i32 [[C2:%.]], 0
; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN1:%.]]		; CHECK-NEXT: br i1 [[TOBOOL]], label [[IF_END:%.]], label [[IF_THEN1:%.]]
; CHECK: if.then1:		; CHECK: if.then1:
; CHECK-NEXT: call void @escape_i32_ptr(i32* nonnull [[X]])		; CHECK-NEXT: call void @escape_i32_ptr(i32* nonnull [[X]])
; CHECK-NEXT: br label [[IF_END]]		; CHECK-NEXT: br label [[IF_END]]
; CHECK: if.end:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]])
; CHECK-NEXT: call void @abort()		; CHECK-NEXT: call void @abort()
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
		; CHECK: if.end:
		; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP0]])
		; CHECK-NEXT: br label [[COMMON_UNREACHABLE:%.*]]
; CHECK: if.then3:		; CHECK: if.then3:
; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[Y]] to i8*		; CHECK-NEXT: [[TMP1:%.]] = bitcast i32 [[Y]] to i8*
; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP1]])		; CHECK-NEXT: call void @llvm.lifetime.start.p0i8(i64 4, i8* nonnull [[TMP1]])
; CHECK-NEXT: store i32 0, i32* [[Y]], align 4		; CHECK-NEXT: store i32 0, i32* [[Y]], align 4
; CHECK-NEXT: [[TOBOOL5:%.*]] = icmp eq i32 [[C2]], 0		; CHECK-NEXT: [[TOBOOL5:%.*]] = icmp eq i32 [[C2]], 0
; CHECK-NEXT: br i1 [[TOBOOL5]], label [[IF_END7:%.]], label [[IF_THEN6:%.]]		; CHECK-NEXT: br i1 [[TOBOOL5]], label [[IF_END7:%.]], label [[IF_THEN6:%.]]
; CHECK: if.then6:		; CHECK: if.then6:
; CHECK-NEXT: call void @escape_i32_ptr(i32* nonnull [[Y]])		; CHECK-NEXT: call void @escape_i32_ptr(i32* nonnull [[Y]])
; CHECK-NEXT: br label [[IF_END7]]		; CHECK-NEXT: br label [[IF_END7]]
; CHECK: if.end7:		; CHECK: if.end7:
; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP1]])		; CHECK-NEXT: call void @llvm.lifetime.end.p0i8(i64 4, i8* nonnull [[TMP1]])
; CHECK-NEXT: call void @abort()		; CHECK-NEXT: br label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: unreachable
; CHECK: if.end9:		; CHECK: if.end9:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%x = alloca i32, align 4		%x = alloca i32, align 4
%y = alloca i32, align 4		%y = alloca i32, align 4
switch i32 %c1, label %if.end9 [		switch i32 %c1, label %if.end9 [
i32 13, label %if.then		i32 13, label %if.then
Show All 39 Lines
; Dead phis in the block need to be handled.		; Dead phis in the block need to be handled.

declare void @llvm.dbg.value(metadata, i64, metadata, metadata)		declare void @llvm.dbg.value(metadata, i64, metadata, metadata)

define void @dead_phi() {		define void @dead_phi() {
; CHECK-LABEL: @dead_phi(		; CHECK-LABEL: @dead_phi(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C1:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[A1:%.]]		; CHECK-NEXT: br i1 [[C1]], label [[CONT1:%.]], label [[COMMON_UNREACHABLE:%.]]
; CHECK: a1:		; CHECK: common.unreachable:
; CHECK-NEXT: [[DEAD:%.]] = phi i32 [ 0, [[ENTRY:%.]] ], [ 1, [[CONT1]] ]
; CHECK-NEXT: call void @assert_fail_1(i32 0)		; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: cont1:		; CHECK: cont1:
; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C2:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.*]], label [[A1]]		; CHECK-NEXT: br i1 [[C2]], label [[CONT2:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: cont2:		; CHECK: cont2:
; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()		; CHECK-NEXT: [[C3:%.*]] = call i1 @foo()
; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.]], label [[A3:%.]]		; CHECK-NEXT: br i1 [[C3]], label [[CONT3:%.*]], label [[COMMON_UNREACHABLE]]
; CHECK: a3:
; CHECK-NEXT: call void @assert_fail_1(i32 0)
; CHECK-NEXT: unreachable
; CHECK: cont3:		; CHECK: cont3:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
%c1 = call i1 @foo()		%c1 = call i1 @foo()
br i1 %c1, label %cont1, label %a1		br i1 %c1, label %cont1, label %a1
a1:		a1:
%dead = phi i32 [ 0, %entry ], [ 1, %cont1 ]		%dead = phi i32 [ 0, %entry ], [ 1, %cont1 ]
Show All 12 Lines	cont3:
ret void		ret void
}		}

define void @strip_dbg_value(i32 %c) {		define void @strip_dbg_value(i32 %c) {
; CHECK-LABEL: @strip_dbg_value(		; CHECK-LABEL: @strip_dbg_value(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[C:%.*]], metadata [[META5:![0-9]+]], metadata !DIExpression()), !dbg [[DBG7:![0-9]+]]		; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[C:%.*]], metadata [[META5:![0-9]+]], metadata !DIExpression()), !dbg [[DBG7:![0-9]+]]
; CHECK-NEXT: switch i32 [[C]], label [[SW_EPILOG:%.*]] [		; CHECK-NEXT: switch i32 [[C]], label [[SW_EPILOG:%.*]] [
; CHECK-NEXT: i32 13, label [[SW_BB:%.*]]		; CHECK-NEXT: i32 13, label [[COMMON_UNREACHABLE:%.*]]
; CHECK-NEXT: i32 42, label [[SW_BB1:%.*]]		; CHECK-NEXT: i32 42, label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: ]		; CHECK-NEXT: ]
; CHECK: sw.bb:		; CHECK: common.unreachable:
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 55, metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]
; CHECK-NEXT: tail call void @abort()
; CHECK-NEXT: unreachable
; CHECK: sw.bb1:
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 67, metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]
; CHECK-NEXT: tail call void @abort()		; CHECK-NEXT: tail call void @abort()
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: sw.epilog:		; CHECK: sw.epilog:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
call void @llvm.dbg.value(metadata i32 %c, i64 0, metadata !12, metadata !13), !dbg !14		call void @llvm.dbg.value(metadata i32 %c, i64 0, metadata !12, metadata !13), !dbg !14
switch i32 %c, label %sw.epilog [		switch i32 %c, label %sw.epilog [
Show All 15 Lines	sw.epilog: ; preds = %entry
ret void		ret void
}		}

define void @dead_phi_and_dbg(i32 %c) {		define void @dead_phi_and_dbg(i32 %c) {
; CHECK-LABEL: @dead_phi_and_dbg(		; CHECK-LABEL: @dead_phi_and_dbg(
; CHECK-NEXT: entry:		; CHECK-NEXT: entry:
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[C:%.*]], metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]		; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[C:%.*]], metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]
; CHECK-NEXT: switch i32 [[C]], label [[SW_EPILOG:%.*]] [		; CHECK-NEXT: switch i32 [[C]], label [[SW_EPILOG:%.*]] [
; CHECK-NEXT: i32 13, label [[SW_BB:%.*]]		; CHECK-NEXT: i32 13, label [[COMMON_UNREACHABLE:%.*]]
; CHECK-NEXT: i32 42, label [[SW_BB1:%.*]]		; CHECK-NEXT: i32 42, label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: i32 53, label [[SW_BB2:%.*]]		; CHECK-NEXT: i32 53, label [[COMMON_UNREACHABLE]]
; CHECK-NEXT: ]		; CHECK-NEXT: ]
; CHECK: sw.bb:		; CHECK: common.unreachable:
; CHECK-NEXT: [[C_1:%.]] = phi i32 [ 55, [[ENTRY:%.]] ], [ 67, [[SW_BB1]] ]
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 [[C_1]], metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]
; CHECK-NEXT: tail call void @abort()
; CHECK-NEXT: unreachable
; CHECK: sw.bb1:
; CHECK-NEXT: br label [[SW_BB]]
; CHECK: sw.bb2:
; CHECK-NEXT: call void @llvm.dbg.value(metadata i32 84, metadata [[META5]], metadata !DIExpression()), !dbg [[DBG7]]
; CHECK-NEXT: tail call void @abort()		; CHECK-NEXT: tail call void @abort()
; CHECK-NEXT: unreachable		; CHECK-NEXT: unreachable
; CHECK: sw.epilog:		; CHECK: sw.epilog:
; CHECK-NEXT: ret void		; CHECK-NEXT: ret void
;		;
entry:		entry:
call void @llvm.dbg.value(metadata i32 %c, i64 0, metadata !12, metadata !13), !dbg !14		call void @llvm.dbg.value(metadata i32 %c, i64 0, metadata !12, metadata !13), !dbg !14
switch i32 %c, label %sw.epilog [		switch i32 %c, label %sw.epilog [
Show All 35 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[SimplifyCFG] Tail-merging all blocks with `unreachable` terminatorAbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 354321

llvm/lib/Transforms/Scalar/SimplifyCFGPass.cpp

llvm/test/CodeGen/Thumb2/setjmp_longjmp.ll

llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll

[SimplifyCFG] Tail-merging all blocks with `unreachable` terminator
AbandonedPublic