This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/CodeGen/
-
CodeGen/
8
CodeGenPrepare.cpp
-
test/CodeGen/
-
CodeGen/
-
MIR/Generic/
-
Generic/
3
early-ret-bb.mir
-
PowerPC/
-
early-exit-shrink-wrap.ll

Differential D36504

[CodeGenPrepare][WIP] Convert uncond. branch to return into a return to help with shrink-wrapping
AbandonedPublic

Authored by sfertile on Aug 8 2017, 9:55 PM.

Download Raw Diff

Details

Reviewers

hfinkel
echristo
kbarton
efriedma
spatel
lei
nemanjai

Summary

Currently, CGP will get rid of mostly empty blocks by merging them with their successors. However, in a case where a block BB simply branches unconditionally to a block that has no code other than a return, it can be helpful if BB just returned since shrink-wrapping might have another candidate for the epilogue.

Motivating test case:

; Function Attrs: noinline nounwind readnone
define signext i32 @callee(i32 signext %a, i32 signext %lim) local_unnamed_addr #0 {
entry:
  %cmp5 = icmp sgt i32 %lim, 0
  br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup:                                 ; preds = %for.body, %entry
  %Ret.0.lcssa = phi i32 [ 0, %entry ], [ %0, %for.body ]
  ret i32 %Ret.0.lcssa

for.body:                                         ; preds = %for.body.preheader, %for.body
  %i.07 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
  %Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ]
  %0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29},~{r30},~{r31}"(i32 %a, i32 %Ret.06)
  %inc = add nuw nsw i32 %i.07, 1
  %exitcond = icmp eq i32 %inc, %lim
  br i1 %exitcond, label %for.cond.cleanup, label %for.body

IR into Codegen Prepare:

; Function Attrs: noinline nounwind readnone
define signext i32 @caller(i32 signext %a, i32 signext %lim) local_unnamed_addr #0 {
entry:
  %cmp5 = icmp sgt i32 %lim, 0
  br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

for.body.preheader:                               ; preds = %entry
  br label %for.body

for.cond.cleanup.loopexit:                        ; preds = %for.body
  %.lcssa = phi i32 [ %0, %for.body ]
  br label %for.cond.cleanup

for.cond.cleanup:                                 ; preds = %for.cond.cleanup.loopexit, %entry
  %Ret.0.lcssa = phi i32 [ 0, %entry ], [ %.lcssa, %for.cond.cleanup.loopexit ]
  ret i32 %Ret.0.lcssa

for.body:                                         ; preds = %for.body, %for.body.preheader
  %lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ %lim, %for.body.preheader ]
  %Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ]
  %0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29},~{r30},~{r31}"(i32 %a, i32 %Ret.06) #3, !srcloc !3
  %lsr.iv.next = add i32 %lsr.iv, -1
  %exitcond = icmp eq i32 %lsr.iv.next, 0
  br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}

Block for.cond.cleanup.loopexit can be used for the epilogue if flow has gone through for.body whereas block for.cond.cleanup can be used for the epilogue if flow never went into the loop.

The patch certainly increases the amount of shrink-wrapping we do. I'm currently running spec on PPC and will post the net change once the results are ready.

Diff Detail

Repository: rL LLVM

Event Timeline

nemanjai created this revision.Aug 8 2017, 9:55 PM

As one might expect, this causes a fair number of lit test cases to fail (65 to be more precise). Rather than sinking a whole bunch of time initially on updating them and producing a very large patch, I wanted to get a sense of whether the reviewers are interested in proceeding with this patch first.

So if we do want this (and we do want it to trigger for all targets), I'll update the test cases upon hearing so.

lib/CodeGen/CodeGenPrepare.cpp
332	I initially meant to add a target hook to see if this is desired by the target, but decided against it as I don't see an issue with always doing so. Depending on the review comments, I can either remove this comment or re-add the target hook.

nemanjai retitled this revision from [CodeGenPrepare] Convert uncond. branch to return into a return to help with shrink-wrapping to [CodeGenPrepare][WIP] Convert uncond. branch to return into a return to help with shrink-wrapping.Aug 8 2017, 10:34 PM

junbuml added a subscriber: junbuml.Aug 9 2017, 7:44 AM

mcrosier added a subscriber: bmakam.Aug 9 2017, 8:09 AM

Just some stats (on PPC):

3.8% more functions shrink-wrapped in the bootstrap build
14.3% more functions shrink-wrapped in SPEC

Performance wise SPEC-FP shows no significant change.
SPEC-INT shows a 7% improvement in Xalancbmx, 1% improvement in Sjeng, slightly under 5% degradation in h264ref and sub-1% improvements on everything else.

Overall, I agree with the idea. I didn't see any performance differences in Xalancbmk and h264ref on AArach64 though. I guess this will not leave a branch instruction in the empty block as it directly return.
Should the predecessor branching to an empty return block also be empty? Your implementation doesn't seem to check if the predecessor is empty.
Just curious about the -5% in h264ref on PPC?

lib/CodeGen/CodeGenPrepare.cpp
653	Isn't it possible to do this in the main loop of eliminateMostlyEmptyBlocks()?
2625–2627	It seems that you didn't check if the predecessor is empty? Is this what you intended?
2639–2640	I guess you might reduce DEBUGs in this function. Right?
test/CodeGen/MIR/Generic/early-ret-bb.mir
1	Why do you use .mir, instead of .ll?

In D36504#836904, @junbuml wrote:

Overall, I agree with the idea. I didn't see any performance differences in Xalancbmk and h264ref on AArach64 though. I guess this will not leave a branch instruction in the empty block as it directly return.

Should the predecessor branching to an empty return block also be empty? Your implementation doesn't seem to check if the predecessor is empty.

Just curious about the -5% in h264ref on PPC?

Do you see any performance difference in any of your benchmarks (i.e. other than Xalan and h264ref)?
I haven't investigated the degradation on h264ref yet, but I do plan to look into it before making further updates to this patch. I figured it'd be nice to hear from other targets as well about whether this has any impact.

lib/CodeGen/CodeGenPrepare.cpp
653	I initially had it there but removed it for two reasons (neither of which I feel too strongly about): It just didn't feel like it should go there since I'm doing an inherently different thing That loop skips the entry block and I didn't see a compelling reason to do so for this. Of course, I could just have one invocation before the loop but this looked cleaner. As I said, I'm not opposed to putting it back.
2625–2627	I initially had a check that the predecessor has a single instruction (i.e. the branch). But as it turns out, the predecessor will actually have a PHI as well in the motivating test case (albeit a PHI with a single entry). Then I got to thinking about whether there's a compelling reason for the predecessor to have to be empty. And I couldn't think of one. I suppose we can check if it: has a single instruction has a single non-PHI instruction has a single instruction or a single predecessor Not sure which of those would be the most appropriate (I am leaning toward the middle one).
2639–2640	Yeah. It was nice during development to have all of that. And I figured I'd leave it here while this patch is WIP for anyone that wants to play with the patch. But if this becomes a real patch, I'll reduce the DEBUG messages.
test/CodeGen/MIR/Generic/early-ret-bb.mir
1	I do have a .ll test case as well. I wanted to also provide a .mir test case to test the actual transformation in CGP rather than just its effect on the final ASM.

Do you see any performance difference in any of your benchmarks (i.e. other than Xalan and h264ref)?

I try to test this on AArch64 and share the result.

lib/CodeGen/CodeGenPrepare.cpp
2625–2627	Based on your summary and function name (dupRetFedByEmptyBlocks), I just expected that you check if BB is empty. However, your implementation doesn't seem to really check if BB is empty or not. So, I wanted to understand your intention. I also doubt if BB really need to be a kind of empty? For me, it seems okay to duplicate return to BB even when BB is not empty as you implement right now. I think this need to be done before eliminateMostlyEmptyBlocks(), but BB doesn't need to be empty. Please let me know if there is any downside of doing this when BB is non-empty.
test/CodeGen/MIR/Generic/early-ret-bb.mir
1	Why don't you use opt : opt < %s -codegenprepare

guyblank added a subscriber: guyblank.Aug 10 2017, 7:06 AM

Ping.
I am just wondering if any other targets have had a chance to try this out and whether there's any interest in proceeding with this patch as a result.
I will certainly address the comments raised by @junbuml if this patch has a future, but I'd like to evaluate whether there's any interest in proceeding before doing more work on this patch. Perhaps I'll open this up to a wider audience with a quick RFC email on llvm-dev.

Hi Nemanja,

Sorry for the delay. I tried it on AArach64 for spec2006/2000, and I didn't see any clear difference in performance. However, I do not disagree with the idea.
Thanks,
Jun

Isn't tail duplication a better place to do this?

In D36504#909077, @kparzysz wrote:

Isn't tail duplication a better place to do this?

Possibly, I'll have a look at that.

sfertile commandeered this revision.Oct 27 2017, 10:02 AM

sfertile added a reviewer: nemanjai.

sfertile edited the summary of this revision. (Show Details)Nov 1 2017, 12:10 PM

sfertile edited the summary of this revision. (Show Details)Nov 1 2017, 12:13 PM

In D36504#909077, @kparzysz wrote:

Isn't tail duplication a better place to do this?

So I attempted this in the PreRA tail-duplication pass. Unfortunately by the time we get there the opportunity is gone. If we do not duplicate the return then CodegenPrepare will end up deleting the for.cond.cleanup.loopexit block and we loose the shrink wrapping opportunity.

In D36504#913010, @sfertile wrote:

So I attempted this in the PreRA tail-duplication pass. Unfortunately by the time we get there the opportunity is gone. If we do not duplicate the return then CodegenPrepare will end up deleting the for.cond.cleanup.loopexit block and we loose the shrink wrapping opportunity.

Then perhaps shrink-wrapping should be extended to be able to add blocks. CodeGenPrepare is not the place for this.

thegameg added a subscriber: thegameg.Nov 2 2017, 8:01 AM

In D36504#913949, @kparzysz wrote:
CodeGenPrepare is not the place for this.

Thanks for the feedback. Can you elaborate a bit more on why we wouldn't want to do this in CodeGenPrepare? Is the issue that we are always duplicating the return right now? If thats the problem then we can try to come-up with situations where it may be profitable to duplicate the return and only do so under those circumstances. Also shrink-wrapping isn't the only thing helped by this. This also addresses an issue where we miss tail-call opportunities because the result of the call feeds a PHI which feeds the return instruction. By duplicating the return into the predecessor and removing the PHI we are able to tail call. In this case we would have to do the transformation here since we decide whether to tail-call or not during the SDag lowering, which comes next and only has access to a single basic block at a time.

In D36504#913949, @kparzysz wrote:

Then perhaps shrink-wrapping should be extended to be able to add blocks.

I don't know enough about shrink wrapping to know if that is feasible or not. It doesn't seem practical to me, but maybe Francis or Quentin can comment?

I don't know enough about shrink wrapping to know if that is feasible or not. It doesn't seem practical to me, but maybe Francis or Quentin can comment?

We'd actually want to do something relate to solve https://bugs.llvm.org/show_bug.cgi?id=33868.
We didn't have a chance to look at it yet.

What I'd like to have is a way to simulate critical edge splitting in DT and only materialize the basic block if we end up inserting something into it.

Don't know if splitting critical edges would solve what you are supporting here, though.

That being said, like Krzysztof said, I don't think CodeGenPrepare is the right place for this.

In D36504#914201, @sfertile wrote:

Thanks for the feedback. Can you elaborate a bit more on why we wouldn't want to do this in CodeGenPrepare?

Generally speaking, CGP is kind of a hack. We should work to take away from it, not to add to it. From what I know, the main motivation for CGP was to address some deficiencies of the instruction selection via the selection DAG, so we should at least try to limit it to doing that.

Opportunistically adding simple transformations to where it's convenient, not to where they belong logically is a suboptimal long-term strategy, although it looks appealing in the short term. Writing an optimization that can create shrink-wrapping opportunities on optimized code is certainly more involved, but will likely provide better results.

In D36504#914216, @kparzysz wrote:

In D36504#914201, @sfertile wrote:

Thanks for the feedback. Can you elaborate a bit more on why we wouldn't want to do this in CodeGenPrepare?

Generally speaking, CGP is kind of a hack. We should work to take away from it, not to add to it. From what I know, the main motivation for CGP was to address some deficiencies of the instruction selection via the selection DAG, so we should at least try to limit it to doing that.

Thanks, I didn't realize that was CGP main motivation.

In D36504#914216, @kparzysz wrote:

In D36504#914201, @sfertile wrote:

Thanks for the feedback. Can you elaborate a bit more on why we wouldn't want to do this in CodeGenPrepare?

Opportunistically adding simple transformations to where it's convenient, not to where they belong logically is a suboptimal long-term strategy, although it looks appealing in the short term. Writing an optimization that can create shrink-wrapping opportunities on optimized code is certainly more involved, but will likely provide better results.

Yes, I can appreciate that. Abandoning this patch to address it properly makes sense.

What is people opinion on duplicating the return for the very specific case of enabling a tail-call? I don't think that can be cleaned up inside the selection-dag, and after the selection-dag is too late. Is cgp the best place to address that or is there somewhere better to consider?

In D36504#914213, @qcolombet wrote:

I don't know enough about shrink wrapping to know if that is feasible or not. It doesn't seem practical to me, but maybe Francis or Quentin can comment?

We'd actually want to do something relate to solve https://bugs.llvm.org/show_bug.cgi?id=33868.
We didn't have a chance to look at it yet.

What I'd like to have is a way to simulate critical edge splitting in DT and only materialize the basic block if we end up inserting something into it.

Don't know if splitting critical edges would solve what you are supporting here, though.

IIUC I think it would address our motivating examples, at least the for the shrink-wrap opportunities we were interested in . Can you describe what you had planned a bit more? I believe this is something either I or one of my team members would be willing to pick up.

It seems dupRetToEnableTailCallOpts already does duplicate all the tail-call cases PPC is missing on any of the targets that implement 'mayBeEmittedAsTailCall', so I know to address that.

Revision Contents

Path

Size

lib/

CodeGen/

CodeGenPrepare.cpp

80 lines

test/

CodeGen/

MIR/

Generic/

early-ret-bb.mir

67 lines

PowerPC/

early-exit-shrink-wrap.ll

35 lines

Diff 110295

lib/CodeGen/CodeGenPrepare.cpp

Show First 20 Lines • Show All 223 Lines • ▼ Show 20 Lines	void getAnalysisUsage(AnalysisUsage &AU) const override {
AU.addRequired<ProfileSummaryInfoWrapperPass>();		AU.addRequired<ProfileSummaryInfoWrapperPass>();
AU.addRequired<TargetLibraryInfoWrapperPass>();		AU.addRequired<TargetLibraryInfoWrapperPass>();
AU.addRequired<TargetTransformInfoWrapperPass>();		AU.addRequired<TargetTransformInfoWrapperPass>();
AU.addRequired<LoopInfoWrapperPass>();		AU.addRequired<LoopInfoWrapperPass>();
}		}

private:		private:
bool eliminateFallThrough(Function &F);		bool eliminateFallThrough(Function &F);
		bool tryEarlyReturns(Function &F);
bool eliminateMostlyEmptyBlocks(Function &F);		bool eliminateMostlyEmptyBlocks(Function &F);
BasicBlock findDestBlockOfMergeableEmptyBlock(BasicBlock BB);		BasicBlock findDestBlockOfMergeableEmptyBlock(BasicBlock BB);
bool canMergeBlocks(const BasicBlock BB, const BasicBlock DestBB) const;		bool canMergeBlocks(const BasicBlock BB, const BasicBlock DestBB) const;
void eliminateMostlyEmptyBlock(BasicBlock *BB);		void eliminateMostlyEmptyBlock(BasicBlock *BB);
bool isMergingEmptyBlockProfitable(BasicBlock BB, BasicBlock DestBB,		bool isMergingEmptyBlockProfitable(BasicBlock BB, BasicBlock DestBB,
bool isPreheader);		bool isPreheader);
bool optimizeBlock(BasicBlock &BB, bool &ModifiedDT);		bool optimizeBlock(BasicBlock &BB, bool &ModifiedDT);
bool optimizeInst(Instruction *I, bool &ModifiedDT);		bool optimizeInst(Instruction *I, bool &ModifiedDT);
bool optimizeMemoryInst(Instruction I, Value Addr,		bool optimizeMemoryInst(Instruction I, Value Addr,
Type *AccessTy, unsigned AS);		Type *AccessTy, unsigned AS);
bool optimizeInlineAsmInst(CallInst *CS);		bool optimizeInlineAsmInst(CallInst *CS);
bool optimizeCallInst(CallInst *CI, bool &ModifiedDT);		bool optimizeCallInst(CallInst *CI, bool &ModifiedDT);
bool optimizeExt(Instruction *&I);		bool optimizeExt(Instruction *&I);
bool optimizeExtUses(Instruction *I);		bool optimizeExtUses(Instruction *I);
bool optimizeLoadExt(LoadInst *I);		bool optimizeLoadExt(LoadInst *I);
bool optimizeSelectInst(SelectInst *SI);		bool optimizeSelectInst(SelectInst *SI);
bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);		bool optimizeShuffleVectorInst(ShuffleVectorInst *SI);
bool optimizeSwitchInst(SwitchInst *CI);		bool optimizeSwitchInst(SwitchInst *CI);
bool optimizeExtractElementInst(Instruction *Inst);		bool optimizeExtractElementInst(Instruction *Inst);
bool dupRetToEnableTailCallOpts(BasicBlock *BB);		bool dupRetToEnableTailCallOpts(BasicBlock *BB);
		bool dupRetFedByEmptyBlocks(BasicBlock *BB);
bool placeDbgValues(Function &F);		bool placeDbgValues(Function &F);
bool canFormExtLd(const SmallVectorImpl<Instruction *> &MovedExts,		bool canFormExtLd(const SmallVectorImpl<Instruction *> &MovedExts,
LoadInst &LI, Instruction &Inst, bool HasPromoted);		LoadInst &LI, Instruction &Inst, bool HasPromoted);
bool tryToPromoteExts(TypePromotionTransaction &TPT,		bool tryToPromoteExts(TypePromotionTransaction &TPT,
const SmallVectorImpl<Instruction *> &Exts,		const SmallVectorImpl<Instruction *> &Exts,
SmallVectorImpl<Instruction *> &ProfitablyMovedExts,		SmallVectorImpl<Instruction *> &ProfitablyMovedExts,
unsigned CreatedInstsCost = 0);		unsigned CreatedInstsCost = 0);
bool mergeSExts(Function &F);		bool mergeSExts(Function &F);
▲ Show 20 Lines • Show All 61 Lines • ▼ Show 20 Lines	while (BB != nullptr) {
// bypassSlowDivision may create new BBs, but we don't want to reapply the		// bypassSlowDivision may create new BBs, but we don't want to reapply the
// optimization to those blocks.		// optimization to those blocks.
BasicBlock* Next = BB->getNextNode();		BasicBlock* Next = BB->getNextNode();
EverMadeChange \|= bypassSlowDivision(BB, BypassWidths);		EverMadeChange \|= bypassSlowDivision(BB, BypassWidths);
BB = Next;		BB = Next;
}		}
}		}

		// Before we eliminate empty blocks, try to see if some of them can be
		// converted to return blocks if the target wants that.
		nemanjaiUnsubmitted Not Done Reply Inline Actions I initially meant to add a target hook to see if this is desired by the target, but decided against it as I don't see an issue with always doing so. Depending on the review comments, I can either remove this comment or re-add the target hook. nemanjai: I initially meant to add a target hook to see if this is desired by the target, but decided…
		EverMadeChange \|= tryEarlyReturns(F);

// Eliminate blocks that contain only PHI nodes and an		// Eliminate blocks that contain only PHI nodes and an
// unconditional branch.		// unconditional branch.
EverMadeChange \|= eliminateMostlyEmptyBlocks(F);		EverMadeChange \|= eliminateMostlyEmptyBlocks(F);

// llvm.dbg.value is far away from the value then iSel may not be able		// llvm.dbg.value is far away from the value then iSel may not be able
// handle it properly. iSel will drop llvm.dbg.value if it can not		// handle it properly. iSel will drop llvm.dbg.value if it can not
// find a node corresponding to the value.		// find a node corresponding to the value.
EverMadeChange \|= placeDbgValues(F);		EverMadeChange \|= placeDbgValues(F);
▲ Show 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	for (BasicBlock *Target : Targets) {
}		}

Changed = true;		Changed = true;
}		}

return Changed;		return Changed;
}		}

		/// Try to see if any blocks just branch to a return block that has no code.
		/// It may be beneficial to convert such blocks into return blocks.
		bool CodeGenPrepare::tryEarlyReturns(Function &F) {
		bool MadeChange = false;
		for (BasicBlock &BB : F)
		MadeChange \|= dupRetFedByEmptyBlocks(&BB);
		junbumlUnsubmitted Not Done Reply Inline Actions Isn't it possible to do this in the main loop of eliminateMostlyEmptyBlocks()? junbuml: Isn't it possible to do this in the main loop of eliminateMostlyEmptyBlocks()?
		nemanjaiUnsubmitted Not Done Reply Inline Actions I initially had it there but removed it for two reasons (neither of which I feel too strongly about): It just didn't feel like it should go there since I'm doing an inherently different thing That loop skips the entry block and I didn't see a compelling reason to do so for this. Of course, I could just have one invocation before the loop but this looked cleaner. As I said, I'm not opposed to putting it back. nemanjai: I initially had it there but removed it for two reasons (neither of which I feel too strongly…
		return MadeChange;
		}

/// Eliminate blocks that contain only PHI nodes, debug info directives, and an		/// Eliminate blocks that contain only PHI nodes, debug info directives, and an
/// unconditional branch. Passes before isel (e.g. LSR/loopsimplify) often split		/// unconditional branch. Passes before isel (e.g. LSR/loopsimplify) often split
/// edges in ways that are non-optimal for isel. Start by eliminating these		/// edges in ways that are non-optimal for isel. Start by eliminating these
/// blocks so we can split them the way we want them.		/// blocks so we can split them the way we want them.
bool CodeGenPrepare::eliminateMostlyEmptyBlocks(Function &F) {		bool CodeGenPrepare::eliminateMostlyEmptyBlocks(Function &F) {
SmallPtrSet<BasicBlock *, 16> Preheaders;		SmallPtrSet<BasicBlock *, 16> Preheaders;
SmallVector<Loop *, 16> LoopList(LI->begin(), LI->end());		SmallVector<Loop *, 16> LoopList(LI->begin(), LI->end());
while (!LoopList.empty()) {		while (!LoopList.empty()) {
▲ Show 20 Lines • Show All 1,952 Lines • ▼ Show 20 Lines	bool CodeGenPrepare::dupRetToEnableTailCallOpts(BasicBlock *BB) {

// If we eliminated all predecessors of the block, delete the block now.		// If we eliminated all predecessors of the block, delete the block now.
if (Changed && !BB->hasAddressTaken() && pred_begin(BB) == pred_end(BB))		if (Changed && !BB->hasAddressTaken() && pred_begin(BB) == pred_end(BB))
BB->eraseFromParent();		BB->eraseFromParent();

return Changed;		return Changed;
}		}

		/// Look for opportunities to duplicate a return instruction into a predecessor
		/// that does nothing but branch to the return block. Namely, duplicate a return
		/// statement in both of bb0/bb1 below:
		junbumlUnsubmitted Not Done Reply Inline Actions It seems that you didn't check if the predecessor is empty? Is this what you intended? junbuml: It seems that you didn't check if the predecessor is empty? Is this what you intended?
		nemanjaiUnsubmitted Not Done Reply Inline Actions I initially had a check that the predecessor has a single instruction (i.e. the branch). But as it turns out, the predecessor will actually have a PHI as well in the motivating test case (albeit a PHI with a single entry). Then I got to thinking about whether there's a compelling reason for the predecessor to have to be empty. And I couldn't think of one. I suppose we can check if it: has a single instruction has a single non-PHI instruction has a single instruction or a single predecessor Not sure which of those would be the most appropriate (I am leaning toward the middle one). nemanjai: I initially had a check that the predecessor has a single instruction (i.e. the branch). But as…
		junbumlUnsubmitted Not Done Reply Inline Actions Based on your summary and function name (dupRetFedByEmptyBlocks), I just expected that you check if BB is empty. However, your implementation doesn't seem to really check if BB is empty or not. So, I wanted to understand your intention. I also doubt if BB really need to be a kind of empty? For me, it seems okay to duplicate return to BB even when BB is not empty as you implement right now. I think this need to be done before eliminateMostlyEmptyBlocks(), but BB doesn't need to be empty. Please let me know if there is any downside of doing this when BB is non-empty. junbuml: Based on your summary and function name (dupRetFedByEmptyBlocks), I just expected that you…
		/// @code
		/// bb0:
		/// br label %return
		/// bb1:
		/// br label %return
		/// return:
		/// %retval = phi i32 [ %val1, %bb0 ], [ %val2, %bb1 ]
		/// ret i32 %retval
		/// @endcode
		bool CodeGenPrepare::dupRetFedByEmptyBlocks(BasicBlock *BB) {
		BasicBlock *SingleSucc = BB->getSingleSuccessor();
		DEBUG(dbgs() << "Trying to duplicate the successor's return into: ");
		DEBUG(BB->dump());
		junbumlUnsubmitted Not Done Reply Inline Actions I guess you might reduce DEBUGs in this function. Right? junbuml: I guess you might reduce DEBUGs in this function. Right?
		nemanjaiUnsubmitted Not Done Reply Inline Actions Yeah. It was nice during development to have all of that. And I figured I'd leave it here while this patch is WIP for anyone that wants to play with the patch. But if this becomes a real patch, I'll reduce the DEBUG messages. nemanjai: Yeah. It was nice during development to have all of that. And I figured I'd leave it here while…
		// Look at blocks that have a single successor
		if (!SingleSucc) {
		DEBUG(dbgs() << "Don't have a single successor: ");
		DEBUG(dbgs() << BB->getName() << "\n");
		return false;
		}

		// If the successor does nothing but return, then this instruction can
		// be converted to a return.
		ReturnInst *RetInst = dyn_cast<ReturnInst>(SingleSucc->getTerminator());
		if (!RetInst) {
		DEBUG(dbgs() << "Single successor doesn't return: ");
		DEBUG(dbgs() << SingleSucc->getName() << "\n");
		return false;
		}
		// The SingleSucc block must not have any code in it - just a return,
		// otherwise we're skipping the execution of that code.
		if (SingleSucc->getFirstNonPHIOrDbg() != SingleSucc->getTerminator()) {
		DEBUG(dbgs() << "Single successor has code that we can't just bypass.\n");
		return false;
		}

		// Look at the return value, if it's void - return void.
		Value *RetVal = RetInst->getReturnValue();
		if (!RetVal) {
		DEBUG(dbgs() << "BB before:\n");
		DEBUG(BB->dump());
		(void)FoldReturnIntoUncondBranch(RetInst, SingleSucc, BB);
		DEBUG(dbgs() << "BB after:\n");
		DEBUG(BB->dump());
		ModifiedDT = true;
		return true;
		}

		// Otherwise the return value has to come from a PHI and we can just return
		// the incoming value that comes from this BB.
		PHINode *RetPHI = dyn_cast<PHINode>(RetVal);
		if (RetPHI) {
		DEBUG(dbgs() << "BB before:\n");
		DEBUG(BB->dump());
		(void)FoldReturnIntoUncondBranch(RetInst, SingleSucc, BB);
		DEBUG(dbgs() << "BB after:\n");
		DEBUG(BB->dump());
		ModifiedDT = true;
		return true;
		}
		return false;
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Memory Optimization		// Memory Optimization
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

namespace {		namespace {

/// This is an extended version of TargetLowering::AddrMode		/// This is an extended version of TargetLowering::AddrMode
/// which holds actual Value*'s for register values.		/// which holds actual Value*'s for register values.
▲ Show 20 Lines • Show All 3,901 Lines • Show Last 20 Lines

test/CodeGen/MIR/Generic/early-ret-bb.mir

				# RUN: llc -run-pass codegenprepare -o - %s \| FileCheck %s
				junbumlUnsubmitted Not Done Reply Inline Actions Why do you use .mir, instead of .ll? junbuml: Why do you use .mir, instead of .ll?
				nemanjaiUnsubmitted Not Done Reply Inline Actions I do have a .ll test case as well. I wanted to also provide a .mir test case to test the actual transformation in CGP rather than just its effect on the final ASM. nemanjai: I do have a .ll test case as well. I wanted to also provide a .mir test case to test the actual…
				junbumlUnsubmitted Not Done Reply Inline Actions Why don't you use opt : opt < %s -codegenprepare junbuml: Why don't you use opt : opt < %s -codegenprepare

				--- \|

				define signext i32 @shrinkwrapme(i32 signext %a, i32 signext %lim) {
				entry:
				%cmp5 = icmp sgt i32 %lim, 0
				br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				%.lcssa = phi i32 [ %0, %for.body ]
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				%Ret.0.lcssa = phi i32 [ 0, %entry ], [ %.lcssa, %for.cond.cleanup.loopexit ]
				ret i32 %Ret.0.lcssa

				for.body: ; preds = %for.body, %for.body.preheader
				%lsr.iv = phi i32 [ %lsr.iv.next, %for.body ], [ %lim, %for.body.preheader ]
				%Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ]
				%0 = add i32 %a, %Ret.06
				%lsr.iv.next = add i32 %lsr.iv, -1
				%exitcond = icmp eq i32 %lsr.iv.next, 0
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
				}

				...
				---
				# CHECK: for.cond.cleanup.loopexit
				# CHECK-NEXT: ret i32 %0
				# CHECK: for.cond.cleanup
				# CHECK-NEXT: ret i32 0
				name: shrinkwrapme
				alignment: 4
				exposesReturnsTwice: false
				legalized: false
				regBankSelected: false
				selected: false
				tracksRegLiveness: true
				registers:
				liveins:
				frameInfo:
				isFrameAddressTaken: false
				isReturnAddressTaken: false
				hasStackMap: false
				hasPatchPoint: false
				stackSize: 0
				offsetAdjustment: 0
				maxAlignment: 0
				adjustsStack: false
				hasCalls: false
				stackProtector: ''
				maxCallFrameSize: 4294967295
				hasOpaqueSPAdjustment: false
				hasVAStart: false
				hasMustTailInVarArgFunc: false
				savePoint: ''
				restorePoint: ''
				fixedStack:
				stack:
				constants:
				body: \|

				...

test/CodeGen/PowerPC/early-exit-shrink-wrap.ll

				; RUN: llc -verify-machineinstrs < %s -mtriple=powerpc64le-unknown-unknown -mcpu=pwr8 \| FileCheck %s
				define signext i32 @shrinkwrapme(i32 signext %a, i32 signext %lim) {
				entry:
				%cmp5 = icmp sgt i32 %lim, 0
				br i1 %cmp5, label %for.body.preheader, label %for.cond.cleanup

				for.body.preheader: ; preds = %entry
				br label %for.body

				for.cond.cleanup.loopexit: ; preds = %for.body
				br label %for.cond.cleanup

				for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
				%Ret.0.lcssa = phi i32 [ 0, %entry ], [ %0, %for.cond.cleanup.loopexit ]
				ret i32 %Ret.0.lcssa

				for.body: ; preds = %for.body.preheader, %for.body
				%i.07 = phi i32 [ %inc, %for.body ], [ 0, %for.body.preheader ]
				%Ret.06 = phi i32 [ %0, %for.body ], [ 0, %for.body.preheader ]
				%0 = tail call i32 asm "add $0, $1, $2", "=r,r,r,~{r14},~{r15},~{r16},~{r17},~{r18},~{r19},~{r20},~{r21},~{r22},~{r23},~{r24},~{r25},~{r26},~{r27},~{r28},~{r29},~{r30},~{r31}"(i32 %a, i32 %Ret.06)
				%inc = add nuw nsw i32 %i.07, 1
				%exitcond = icmp eq i32 %inc, %lim
				br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body

				; CHECK-LABEL: shrinkwrapme
				; CHECK: cmpwi 4, 1
				; CHECK-NEXT: blt 0, [[RETB:.LBB[0-9_]+]]
				; CHECK: std 28
				; CHECK: mtctr
				; CHECK: bdnz
				; CHECK: ld 28
				; CHECK: [[RETB]]:
				; CHECK-NEXT: li 3, 0
				; CHECK-NEXT: blr
				}