This is an archive of the discontinued LLVM Phabricator instance.

[SROA] Fold a PHI node if all its incoming values are the same
ClosedPublic

Authored by jingyue on Jul 24 2014, 11:07 AM.

Download Raw Diff

Details

Reviewers

chandlerc
eliben
nlewycky
meheff

Commits

rGec33fa9aca4d: [SROA] Fold a PHI node if all its incoming values are the same
rL216299: [SROA] Fold a PHI node if all its incoming values are the same

Summary

Fixes PR20425.

During slice building, if all of the incoming values of a PHI node are the same, replace the PHI node with the common value. This simplification makes alloca's used by PHI nodes easier to promote.

Diff Detail

Event Timeline

jingyue updated this revision to Diff 11851.Jul 24 2014, 11:07 AM

jingyue retitled this revision from to [SROA] Simplify PHI nodes before promoting the alloca.

jingyue updated this object.

jingyue edited the test plan for this revision. (Show Details)

jingyue added reviewers: chandlerc, nlewycky, eliben, meheff.

jingyue added a subscriber: Unknown Object (MLST).

eliben added inline comments.Jul 24 2014, 11:35 AM

lib/Transforms/Scalar/SROA.cpp
3112	by "of the PHI nodes" do you mean "PN"?
3113	Please document all the arguments of this function
3151	Does the logic have to be inverted? If you only use this method in a "!" clause, can't it check positively - i.e. origination from only NewAI? This can make reasoning about it simpler.
3154	whitespace around ":"

Just a couple suggestions which might simplify the code.

lib/Transforms/Scalar/SROA.cpp
3117	Couple bike-shed suggestions: Maybe name the function PHINodeEquivalentToSource? You could also replace both ptr sets with a map<PHINode *, bool> where membership indicates it's been visited and the value indicates equivalence.
3152	If you use a map<PHINode *, bool> for equivalence as mentioned above, you can cleanly do away with ToDelete. Just iterate through the map and replace the nodes with a true value. I, think, necessarily those will all be in PHIUsers.

Thanks Eli and Mark for your comments! I believe the logic is simpler now.

jingyue added inline comments.Jul 24 2014, 2:38 PM

lib/Transforms/Scalar/SROA.cpp
3112	Changed the comment to "incoming values of PN".
3113	done
3117	Merged these data structures to one DenseMap, and renamed the function to PHINodeEquivalentToNewAlloca.
3151	done
3152	Agreed. Done.
3154	Ack'ed.

chandlerc added inline comments.Jul 24 2014, 2:39 PM

lib/Transforms/Scalar/SROA.cpp
3108–3113	I think this is the wrong approach in several ways. Depth first search when there is an early exit shortcut is usually more expensive than testing the predicate breadth first while adding the nodes to a worklist. Also you should use a worklist rather than recursion or you would easily blow out the stack. Finally, I agree that we shouldn't need a set for originating from other and just a visit.... But before we dive into the details to fix these issues -- why is this the right approach? Walking up from the PHI operands seems necessarily much more work than walking down from the alloca. In fact, we're already going to walk down from the alloca because we iterate on every non-promoted alloca. Why can't we fold no-op PHI nodes in the partition builder the same way we do no-op selects? I think that would solve the problem you have here with no extra use-list traversal. For future reference, keep in mind that uselist traversal is a cache-hostile thing. We should minimize the number of traversals and merge traversals where possible.
3113	Wo don't actually insist on that usually. I'd rather the arguments have descriptive names than comments personally.

Chandler, thanks for your suggestions! I think moving the logic to SliceBuilder makes perfect sense.

After modifying SliceBuilder::visitPHINode, I found it almost the same as visitSelectInst. Therefore, I also refactored the code to merge visitPHINode and visitSelectInst to one function. Let me know if it looks good to you.

Yea, this looks nice. I have a bunch of nit-picky comments below... Nothing really significant.

lib/Transforms/Scalar/SROA.cpp
332	Just use nullptr as the initial value?
333	I prefer to only use "I" (capitalized) for iterators. I'd use 'i' or 'Idx' here. (and we should really add an iterator and range adaptor set to PHINode for these so you can use a range based for loop...)
336–344	I think this would be slightly simpler as: if (isa<UndefValue>(IV)) continue; if (!CommanValue) CommonValue = IV; else if (CommonValue != IV) return nullptr;
657–663	I'd probably put this in a 'foldPHINodeORSelectInst' static helper so you can write the if below: if (Value *Result = foldPHINodeOrSelectInst(I)) {
688–691	How about just "Size"?
test/Transforms/SROA/phi-and-select.ll
567–568	I like to have all my CHECKs within the function body so I can find them and they don't get spliced. I also try to attach them to the instruction they reference. Can you put the label at the start of the function and then the -NOT after the alloca?

All comments addressed

jingyue added inline comments.Jul 26 2014, 7:55 AM

lib/Transforms/Scalar/SROA.cpp
332	That would make foldPHINode return null for phi(undef, undef).
333	Done. I agree. Will add a range for PHINode in a separate patch.
336–344	Done
657–663	done
688–691	done.
test/Transforms/SROA/phi-and-select.ll
567–568	done

Hi Duncan,

I tried SimplifyPHINode and it worked pretty well. Thanks!

That makes me consider using SimplifySelectInst on select instructions too. However, I found one regression test PR16651.2 would fail after this potential modification. We would transform

define void @PR16651.2() {
; This test case caused a crash due to failing to promote given a select that
; can't be speculated. It shouldn't be promoted, but we missed that fact when
; analyzing whether we could form a vector promotion because that code didn't
; bail on select instructions.
;
; CHECK-LABEL: @PR16651.2(
; CHECK: alloca <2 x float>
; CHECK: ret void

entry:
  %tv1 = alloca { <2 x float>, <2 x float> }, align 8
  %0 = getelementptr { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1
  store <2 x float> undef, <2 x float>* %0, align 8
  %1 = getelementptr inbounds { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1, i64 0
  %cond105.in.i.i = select i1 undef, float* null, float* %1
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

define void @PR16651.2() {
entry:
  %cond105.in.i.i = select i1 undef, float* null, float* undef
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}

Is this transformation on PR16651.2 valid? If no, can somebody help me understand why it isn't?

Thanks,
Jingyue

Original Message -----

From: "Jingyue Wu" <jingyue@google.com>
To: jingyue@google.com, nlewycky@google.com, eliben@google.com, meheff@google.com, chandlerc@gmail.com
Cc: llvm-commits@cs.uiuc.edu
Sent: Sunday, July 27, 2014 11:21:24 AM
Subject: Re: [PATCH] [SROA] Fold a PHI node if all its incoming values are the same

Hi Duncan,

I tried SimplifyPHINode and it worked pretty well. Thanks!

That makes me consider using SimplifySelectInst on select
instructions too. However, I found one regression test PR16651.2
would fail after this potential modification. We would transform
define void @PR16651.2() {
; This test case caused a crash due to failing to promote given a
select that
; can't be speculated. It shouldn't be promoted, but we missed that
fact when
; analyzing whether we could form a vector promotion because that
code didn't
; bail on select instructions.
;
; CHECK-LABEL: @PR16651.2(
; CHECK: alloca <2 x float>
; CHECK: ret void

entry:
  %tv1 = alloca { <2 x float>, <2 x float> }, align 8
  %0 = getelementptr { <2 x float>, <2 x float> }* %tv1, i64 0, i32 1
  store <2 x float> undef, <2 x float>* %0, align 8
  %1 = getelementptr inbounds { <2 x float>, <2 x float> }* %tv1, i64
  0, i32 1, i64 0
  %cond105.in.i.i = select i1 undef, float* null, float* %1
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}
to
define void @PR16651.2() {
entry:
  %cond105.in.i.i = select i1 undef, float* null, float* undef
  %cond105.i.i = load float* %cond105.in.i.i, align 8
  ret void
}
Is this transformation on PR16651.2 valid?

It looks like this transformation is valid only if loading from an undef address always produces an undef value (and never traps). In the original code, loading from %1 always produced an undef value (because undef was explicitly stored into that location, but would have been undef anyway because it is an otherwise-fresh alloca), but did not trap. So if we always replaced undef addresses with some equivalently-sized constant-pool load (or something like that), then this might work. I don't know that we do that, however.

-Hal

If no, can somebody help
me understand why it isn't?

Thanks,
Jingyue

http://reviews.llvm.org/D4659

llvm-commits mailing list
llvm-commits@cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

To avoid redundancy, use SimplifyInstruction to fold PHI nodes and select instructions. One tricky case is when clobbering one operand in

select undef, a, b

we need to have the select always return the other operand by setting the condition "undef" to 0 or 1; otherwise, the transformation of clobbering dead operands may introduce new undefined behavior. Added test @simplify_undef in phi-and-select.ll to cover this case.

Updated basictest.ll per this change.

Can anyone take a look at the latest diff?

In the latest diff, I use SimplifyInstruction to fold both PHI nodes and select instructions. Doing so requires a minor fix in the logic of clobbering dead uses in select instructions: to clobber x/y in "select undef, x, y", we should change the condition from "undef" to 0/1. Otherwise, "select undef, undef, y" or "select undef, x, undef" can introduce more undefined behavior. This wasn't an issue in the old code, because the old code does not fold a select with undef condition.

ping?

Sorry, I've been chasing other bugs. Will look today.

Hi Chandler,

I hope your bug chasing is going well. Wondering if you can get a chance to take a look at this patch.

Thanks a lot,
Jingyue

zinovy.nis added a subscriber: zinovy.nis.Aug 18 2014, 6:39 AM

ping?

Sorry. I tried to get back to this a couple of times but it required a lot of thought. And then your pings kept hitting me at bad times. =/

lib/Transforms/Scalar/SROA.cpp
680–699	This is pretty horrible. It makes me question the entire approach of cleaning up instructions here. I hate phase ordering. If we're going to go down the rabbit hole here if using instsimplify within SROA to address phase ordering issues, I think it would be much cleaner to do as part of the recursive user walk for the pointer value, and to actually RAUW the values as they simplify rather than doing the dead-operand-tracking thing here. Consider -- if we don't RAUW then we won't notice if doing so would cause some user of the select (perhaps another select or PHI) to collapse to all inputs being the same. But that requires parameterizing PtrUseVisitor, teaching it to use instsimplify, teaching it to track RAUWs correctly, etc etc etc. Yuck. Absolutely terrible. This is why phase ordering problems are hard. Ultimately, I think the suggestion to use instsimplify is wrong here. We really only want to handle the simplifications that arise because of mem2reg, and the primary one there is when all inputs to the PHI node become the same because we promoted them all to the same register (rather than N different loads). We don't need or want to handle the more complex simplifications because other passes will. If we want to make SROA really powerful against phase ordering issues like this, I think it would need serious architectural changes. So can we go back to the simpler patch after my round of nit-picky comments? Because I suspect that one is essentially "ready to go".

Thanks for the review, Chandler.

I see what you are saying. I agree the current dead-operand-trackign mechanism is a little cumbersome for the use of SimplifyInstruction.

I reverted the use of SimplifyInstruction and left the architectural changes as TODO. PTAL.

Looks great, thanks!

This revision is now accepted and ready to land.Aug 22 2014, 2:29 PM

jingyue closed this revision.Aug 22 2014, 3:55 PM

Revision Contents

Path

Size

lib/

Transforms/

Scalar/

SROA.cpp

82 lines

test/

Transforms/

SROA/

phi-and-select.ll

65 lines

Diff 12856

lib/Transforms/Scalar/SROA.cpp

Show First 20 Lines • Show All 318 Lines • ▼ Show 20 Lines	static Value *foldSelectInst(SelectInst &SI) {
if (ConstantInt *CI = dyn_cast<ConstantInt>(SI.getCondition()))		if (ConstantInt *CI = dyn_cast<ConstantInt>(SI.getCondition()))
return SI.getOperand(1+CI->isZero());		return SI.getOperand(1+CI->isZero());
if (SI.getOperand(1) == SI.getOperand(2))		if (SI.getOperand(1) == SI.getOperand(2))
return SI.getOperand(1);		return SI.getOperand(1);

return nullptr;		return nullptr;
}		}

		/// \brief A helper that folds a PHI node or a select.
		static Value *foldPHINodeOrSelectInst(Instruction &I) {
		if (PHINode *PN = dyn_cast<PHINode>(&I)) {
		// If PN merges together the same value, return that value.
		return PN->hasConstantValue();
		}
		chandlercUnsubmitted Not Done Reply Inline Actions Just use nullptr as the initial value? chandlerc: Just use nullptr as the initial value?
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions That would make foldPHINode return null for phi(undef, undef). jingyue: That would make foldPHINode return null for phi(undef, undef).
		return foldSelectInst(cast<SelectInst>(I));
		chandlercUnsubmitted Not Done Reply Inline Actions I prefer to only use "I" (capitalized) for iterators. I'd use 'i' or 'Idx' here. (and we should really add an iterator and range adaptor set to PHINode for these so you can use a range based for loop...) chandlerc: I prefer to only use "I" (capitalized) for iterators. I'd use 'i' or 'Idx' here. (and we…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Done. I agree. Will add a range for PHINode in a separate patch. jingyue: Done. I agree. Will add a range for PHINode in a separate patch.
		}

/// \brief Builder for the alloca slices.		/// \brief Builder for the alloca slices.
///		///
/// This class builds a set of alloca slices by recursively visiting the uses		/// This class builds a set of alloca slices by recursively visiting the uses
/// of an alloca and making a slice for each load and store at each offset.		/// of an alloca and making a slice for each load and store at each offset.
class AllocaSlices::SliceBuilder : public PtrUseVisitor<SliceBuilder> {		class AllocaSlices::SliceBuilder : public PtrUseVisitor<SliceBuilder> {
friend class PtrUseVisitor<SliceBuilder>;		friend class PtrUseVisitor<SliceBuilder>;
friend class InstVisitor<SliceBuilder>;		friend class InstVisitor<SliceBuilder>;
typedef PtrUseVisitor<SliceBuilder> Base;		typedef PtrUseVisitor<SliceBuilder> Base;

		chandlercUnsubmitted Not Done Reply Inline Actions I think this would be slightly simpler as: if (isa<UndefValue>(IV)) continue; if (!CommanValue) CommonValue = IV; else if (CommonValue != IV) return nullptr; chandlerc: I think this would be slightly simpler as: if (isa<UndefValue>(IV)) continue; if (!
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Done jingyue: Done
const uint64_t AllocSize;		const uint64_t AllocSize;
AllocaSlices &S;		AllocaSlices &S;

SmallDenseMap<Instruction *, unsigned> MemTransferSliceMap;		SmallDenseMap<Instruction *, unsigned> MemTransferSliceMap;
SmallDenseMap<Instruction *, uint64_t> PHIOrSelectSizes;		SmallDenseMap<Instruction *, uint64_t> PHIOrSelectSizes;

/// \brief Set to de-duplicate dead instructions found in the use walk.		/// \brief Set to de-duplicate dead instructions found in the use walk.
SmallPtrSet<Instruction *, 4> VisitedDeadInsts;		SmallPtrSet<Instruction *, 4> VisitedDeadInsts;
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	do {

for (User *U : I->users())		for (User *U : I->users())
if (Visited.insert(cast<Instruction>(U)))		if (Visited.insert(cast<Instruction>(U)))
Uses.push_back(std::make_pair(I, cast<Instruction>(U)));		Uses.push_back(std::make_pair(I, cast<Instruction>(U)));
} while (!Uses.empty());		} while (!Uses.empty());

return nullptr;		return nullptr;
}		}

void visitPHINode(PHINode &PN) {		void visitPHINodeOrSelectInst(Instruction &I) {
if (PN.use_empty())		assert(isa<PHINode>(I) \|\| isa<SelectInst>(I));
return markAsDead(PN);		if (I.use_empty())
if (!IsOffsetKnown)		return markAsDead(I);
return PI.setAborted(&PN);

// See if we already have computed info on this node.
uint64_t &PHISize = PHIOrSelectSizes[&PN];
if (!PHISize) {
// This is a new PHI node, check for an unsafe use of the PHI node.
if (Instruction *UnsafeI = hasUnsafePHIOrSelectUse(&PN, PHISize))
return PI.setAborted(UnsafeI);
}

// For PHI and select operands outside the alloca, we can't nuke the entire
// phi or select -- the other side might still be relevant, so we special
// case them here and use a separate structure to track the operands
// themselves which should be replaced with undef.
// FIXME: This should instead be escaped in the event we're instrumenting
// for address sanitization.
if (Offset.uge(AllocSize)) {
S.DeadOperands.push_back(U);
return;
}

insertUse(PN, Offset, PHISize);
}

void visitSelectInst(SelectInst &SI) {		// TODO: We could use SimplifyInstruction here to fold PHINodes and
		chandlercUnsubmitted Not Done Reply Inline Actions I'd probably put this in a 'foldPHINodeORSelectInst' static helper so you can write the if below: if (Value Result = foldPHINodeOrSelectInst(I)) { chandlerc:* I'd probably put this in a 'foldPHINodeORSelectInst' static helper so you can write the if…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
if (SI.use_empty())		// SelectInsts. However, doing so requires to change the current
return markAsDead(SI);		// dead-operand-tracking mechanism. For instance, suppose neither loading
if (Value *Result = foldSelectInst(SI)) {		// from %U nor %other traps. Then "load (select undef, %U, %other)" does not
		// trap either. However, if we simply replace %U with undef using the
		// current dead-operand-tracking mechanism, "load (select undef, undef,
		// %other)" may trap because the select may return the first operand
		// "undef".
		if (Value *Result = foldPHINodeOrSelectInst(I)) {
if (Result == *U)		if (Result == *U)
// If the result of the constant fold will be the pointer, recurse		// If the result of the constant fold will be the pointer, recurse
// through the select as if we had RAUW'ed it.		// through the PHI/select as if we had RAUW'ed it.
enqueueUsers(SI);		enqueueUsers(I);
else		else
// Otherwise the operand to the select is dead, and we can replace it		// Otherwise the operand to the PHI/select is dead, and we can replace
// with undef.		// it with undef.
S.DeadOperands.push_back(U);		S.DeadOperands.push_back(U);

return;		return;
}		}

if (!IsOffsetKnown)		if (!IsOffsetKnown)
return PI.setAborted(&SI);		return PI.setAborted(&I);

// See if we already have computed info on this node.		// See if we already have computed info on this node.
uint64_t &SelectSize = PHIOrSelectSizes[&SI];		uint64_t &Size = PHIOrSelectSizes[&I];
if (!SelectSize) {		if (!Size) {
// This is a new Select, check for an unsafe use of it.		// This is a new PHI/Select, check for an unsafe use of it.
if (Instruction *UnsafeI = hasUnsafePHIOrSelectUse(&SI, SelectSize))		if (Instruction *UnsafeI = hasUnsafePHIOrSelectUse(&I, Size))
		chandlercUnsubmitted Not Done Reply Inline Actions How about just "Size"? chandlerc: How about just "Size"?
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions done. jingyue: done.
return PI.setAborted(UnsafeI);		return PI.setAborted(UnsafeI);
}		}

// For PHI and select operands outside the alloca, we can't nuke the entire		// For PHI and select operands outside the alloca, we can't nuke the entire
// phi or select -- the other side might still be relevant, so we special		// phi or select -- the other side might still be relevant, so we special
// case them here and use a separate structure to track the operands		// case them here and use a separate structure to track the operands
// themselves which should be replaced with undef.		// themselves which should be replaced with undef.
// FIXME: This should instead be escaped in the event we're instrumenting		// FIXME: This should instead be escaped in the event we're instrumenting
		chandlercUnsubmitted Not Done Reply Inline Actions This is pretty horrible. It makes me question the entire approach of cleaning up instructions here. I hate phase ordering. If we're going to go down the rabbit hole here if using instsimplify within SROA to address phase ordering issues, I think it would be much cleaner to do as part of the recursive user walk for the pointer value, and to actually RAUW the values as they simplify rather than doing the dead-operand-tracking thing here. Consider -- if we don't RAUW then we won't notice if doing so would cause some user of the select (perhaps another select or PHI) to collapse to all inputs being the same. But that requires parameterizing PtrUseVisitor, teaching it to use instsimplify, teaching it to track RAUWs correctly, etc etc etc. Yuck. Absolutely terrible. This is why phase ordering problems are hard. Ultimately, I think the suggestion to use instsimplify is wrong here. We really only want to handle the simplifications that arise because of mem2reg, and the primary one there is when all inputs to the PHI node become the same because we promoted them all to the same register (rather than N different loads). We don't need or want to handle the more complex simplifications because other passes will. If we want to make SROA really powerful against phase ordering issues like this, I think it would need serious architectural changes. So can we go back to the simpler patch after my round of nit-picky comments? Because I suspect that one is essentially "ready to go". chandlerc: This is pretty horrible. It makes me question the entire approach of cleaning up instructions…
// for address sanitization.		// for address sanitization.
if (Offset.uge(AllocSize)) {		if (Offset.uge(AllocSize)) {
S.DeadOperands.push_back(U);		S.DeadOperands.push_back(U);
return;		return;
}		}

insertUse(SI, Offset, SelectSize);		insertUse(I, Offset, Size);
		}

		void visitPHINode(PHINode &PN) {
		visitPHINodeOrSelectInst(PN);
		}

		void visitSelectInst(SelectInst &SI) {
		visitPHINodeOrSelectInst(SI);
}		}

/// \brief Disable SROA entirely if there are unhandled users of the alloca.		/// \brief Disable SROA entirely if there are unhandled users of the alloca.
void visitInstruction(Instruction &I) {		void visitInstruction(Instruction &I) {
PI.setAborted(&I);		PI.setAborted(&I);
}		}
};		};

▲ Show 20 Lines • Show All 2,377 Lines • ▼ Show 20 Lines	StructType *SubTy = StructType::get(STy->getContext(), makeArrayRef(EI, EE),
STy->isPacked());		STy->isPacked());
const StructLayout *SubSL = DL.getStructLayout(SubTy);		const StructLayout *SubSL = DL.getStructLayout(SubTy);
if (Size != SubSL->getSizeInBytes())		if (Size != SubSL->getSizeInBytes())
return nullptr; // The sub-struct doesn't have quite the size needed.		return nullptr; // The sub-struct doesn't have quite the size needed.

return SubTy;		return SubTy;
}		}

/// \brief Rewrite an alloca partition's users.		/// \brief Rewrite an alloca partition's users.
///		///
/// This routine drives both of the rewriting goals of the SROA pass. It tries		/// This routine drives both of the rewriting goals of the SROA pass. It tries
/// to rewrite uses of an alloca partition to be conducive for SSA value		/// to rewrite uses of an alloca partition to be conducive for SSA value
/// promotion. If the partition needs a new, more refined alloca, this will		/// promotion. If the partition needs a new, more refined alloca, this will
		elibenUnsubmitted Not Done Reply Inline Actions by "of the PHI nodes" do you mean "PN"? eliben: by "of the PHI nodes" do you mean "PN"?
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Changed the comment to "incoming values of PN". jingyue: Changed the comment to "incoming values of PN".
/// build that new alloca, preserving as much type information as possible, and		/// build that new alloca, preserving as much type information as possible, and
		elibenUnsubmitted Not Done Reply Inline Actions Please document all the arguments of this function eliben: Please document all the arguments of this function
		chandlercUnsubmitted Not Done Reply Inline Actions Wo don't actually insist on that usually. I'd rather the arguments have descriptive names than comments personally. chandlerc: Wo don't actually insist on that usually. I'd rather the arguments have descriptive names than…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
		chandlercUnsubmitted Not Done Reply Inline Actions I think this is the wrong approach in several ways. Depth first search when there is an early exit shortcut is usually more expensive than testing the predicate breadth first while adding the nodes to a worklist. Also you should use a worklist rather than recursion or you would easily blow out the stack. Finally, I agree that we shouldn't need a set for originating from other and just a visit.... But before we dive into the details to fix these issues -- why is this the right approach? Walking up from the PHI operands seems necessarily much more work than walking down from the alloca. In fact, we're already going to walk down from the alloca because we iterate on every non-promoted alloca. Why can't we fold no-op PHI nodes in the partition builder the same way we do no-op selects? I think that would solve the problem you have here with no extra use-list traversal. For future reference, keep in mind that uselist traversal is a cache-hostile thing. We should minimize the number of traversals and merge traversals where possible. chandlerc: I think this is the wrong approach in several ways. Depth first search when there is an early…
/// rewrite the uses of the old alloca to point at the new one and have the		/// rewrite the uses of the old alloca to point at the new one and have the
/// appropriate new offsets. It also evaluates how successful the rewrite was		/// appropriate new offsets. It also evaluates how successful the rewrite was
/// at enabling promotion and if it was successful queues the alloca to be		/// at enabling promotion and if it was successful queues the alloca to be
/// promoted.		/// promoted.
		meheffUnsubmitted Not Done Reply Inline Actions Couple bike-shed suggestions: Maybe name the function PHINodeEquivalentToSource? You could also replace both ptr sets with a map<PHINode , bool> where membership indicates it's been visited and the value indicates equivalence. meheff:* Couple bike-shed suggestions: Maybe name the function PHINodeEquivalentToSource? You could…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Merged these data structures to one DenseMap, and renamed the function to PHINodeEquivalentToNewAlloca. jingyue: Merged these data structures to one DenseMap, and renamed the function to…
bool SROA::rewritePartition(AllocaInst &AI, AllocaSlices &S,		bool SROA::rewritePartition(AllocaInst &AI, AllocaSlices &S,
AllocaSlices::iterator B, AllocaSlices::iterator E,		AllocaSlices::iterator B, AllocaSlices::iterator E,
int64_t BeginOffset, int64_t EndOffset,		int64_t BeginOffset, int64_t EndOffset,
ArrayRef<AllocaSlices::iterator> SplitUses) {		ArrayRef<AllocaSlices::iterator> SplitUses) {
assert(BeginOffset < EndOffset);		assert(BeginOffset < EndOffset);
uint64_t SliceSize = EndOffset - BeginOffset;		uint64_t SliceSize = EndOffset - BeginOffset;

// Try to compute a friendly type for this partition of the alloca. This		// Try to compute a friendly type for this partition of the alloca. This
Show All 17 Lines	bool SROA::rewritePartition(AllocaInst &AI, AllocaSlices &S,

bool IsVectorPromotable = isVectorPromotionViable(		bool IsVectorPromotable = isVectorPromotionViable(
*DL, SliceTy, S, BeginOffset, EndOffset, B, E, SplitUses);		*DL, SliceTy, S, BeginOffset, EndOffset, B, E, SplitUses);

bool IsIntegerPromotable =		bool IsIntegerPromotable =
!IsVectorPromotable &&		!IsVectorPromotable &&
isIntegerWideningViable(*DL, SliceTy, BeginOffset, S, B, E, SplitUses);		isIntegerWideningViable(*DL, SliceTy, BeginOffset, S, B, E, SplitUses);

// Check for the case where we're going to rewrite to a new alloca of the		// Check for the case where we're going to rewrite to a new alloca of the
		elibenUnsubmitted Not Done Reply Inline Actions Does the logic have to be inverted? If you only use this method in a "!" clause, can't it check positively - i.e. origination from only NewAI? This can make reasoning about it simpler. eliben: Does the logic have to be inverted? If you only use this method in a "!" clause, can't it check…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done
// exact same type as the original, and with the same access offsets. In that		// exact same type as the original, and with the same access offsets. In that
		meheffUnsubmitted Not Done Reply Inline Actions If you use a map<PHINode , bool> for equivalence as mentioned above, you can cleanly do away with ToDelete. Just iterate through the map and replace the nodes with a true value. I, think, necessarily those will all be in PHIUsers. meheff:* If you use a map<PHINode *, bool> for equivalence as mentioned above, you can cleanly do away…
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Agreed. Done. jingyue: Agreed. Done.
// case, re-use the existing alloca, but still run through the rewriter to		// case, re-use the existing alloca, but still run through the rewriter to
// perform phi and select speculation.		// perform phi and select speculation.
		elibenUnsubmitted Not Done Reply Inline Actions whitespace around ":" eliben: whitespace around ":"
		jingyueAuthorUnsubmitted Not Done Reply Inline Actions Ack'ed. jingyue: Ack'ed.
AllocaInst *NewAI;		AllocaInst *NewAI;
if (SliceTy == AI.getAllocatedType()) {		if (SliceTy == AI.getAllocatedType()) {
assert(BeginOffset == 0 &&		assert(BeginOffset == 0 &&
"Non-zero begin offset but same alloca type");		"Non-zero begin offset but same alloca type");
NewAI = &AI;		NewAI = &AI;
// FIXME: We should be able to bail at this point with "nothing changed".		// FIXME: We should be able to bail at this point with "nothing changed".
// FIXME: We might want to defer PHI speculation until after here.		// FIXME: We might want to defer PHI speculation until after here.
} else {		} else {
▲ Show 20 Lines • Show All 517 Lines • Show Last 20 Lines

test/Transforms/SROA/phi-and-select.ll

	Show First 20 Lines • Show All 495 Lines • ▼ Show 20 Lines
	end:			end:
	%a.phi.f = phi float* [ %a.f, %then ], [ %a.raw.4.f, %else ]			%a.phi.f = phi float* [ %a.f, %then ], [ %a.raw.4.f, %else ]
	%f = load float* %a.phi.f			%f = load float* %a.phi.f
	ret float %f			ret float %f
	; CHECK: %[[phi:.*]] = phi float [ %[[lo_cast]], %then ], [ %[[hi_cast]], %else ]			; CHECK: %[[phi:.*]] = phi float [ %[[lo_cast]], %then ], [ %[[hi_cast]], %else ]
	; CHECK-NOT: load			; CHECK-NOT: load
	; CHECK: ret float %[[phi]]			; CHECK: ret float %[[phi]]
	}			}

				; Verifies we fixed PR20425. We should be able to promote all alloca's to
				; registers in this test.
				;
				; %0 = slice
				; %1 = slice
				; %2 = phi(%0, %1) // == slice
				define float @simplify_phi_nodes_that_equal_slice(i1 %cond, float* %temp) {
				; CHECK-LABEL: @simplify_phi_nodes_that_equal_slice(
				entry:
				%arr = alloca [4 x float], align 4
				; CHECK-NOT: alloca
				br i1 %cond, label %then, label %else

				then:
				%0 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
				store float 1.000000e+00, float* %0, align 4
				br label %merge

				else:
				%1 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
				store float 2.000000e+00, float* %1, align 4
				br label %merge

				merge:
				%2 = phi float* [ %0, %then ], [ %1, %else ]
				store float 0.000000e+00, float* %temp, align 4
				%3 = load float* %2, align 4
				ret float %3
				}

				; A slightly complicated example for PR20425.
				;
				; %0 = slice
				; %1 = phi(%0) // == slice
				; %2 = slice
				; %3 = phi(%1, %2) // == slice
				define float @simplify_phi_nodes_that_equal_slice_2(i1 %cond, float* %temp) {
				; CHECK-LABEL: @simplify_phi_nodes_that_equal_slice_2(
				entry:
				%arr = alloca [4 x float], align 4
				; CHECK-NOT: alloca
				br i1 %cond, label %then, label %else

				then:
				%0 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
				store float 1.000000e+00, float* %0, align 4
				br label %then2

				then2:
				%1 = phi float* [ %0, %then ]
				store float 2.000000e+00, float* %1, align 4
				br label %merge

				else:
				%2 = getelementptr inbounds [4 x float]* %arr, i64 0, i64 3
				store float 3.000000e+00, float* %2, align 4
				br label %merge

				merge:
				%3 = phi float* [ %1, %then2 ], [ %2, %else ]
				store float 0.000000e+00, float* %temp, align 4
				%4 = load float* %3, align 4
				ret float %4
				}
				chandlercUnsubmitted Not Done Reply Inline Actions I like to have all my CHECKs within the function body so I can find them and they don't get spliced. I also try to attach them to the instruction they reference. Can you put the label at the start of the function and then the -NOT after the alloca? chandlerc: I like to have all my CHECKs within the function body so I can find them and they don't get…
				jingyueAuthorUnsubmitted Not Done Reply Inline Actions done jingyue: done