This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
36/57
BypassSlowDivision.cpp

Differential D29896

[BypassSlowDivision] Refactor fast division insertion logic (NFC)
ClosedPublic

Authored by n.bozhenov on Feb 13 2017, 9:49 AM.

Download Raw Diff

Details

Reviewers

jlebar

Commits

rGd4b12b3348be: [BypassSlowDivision] Refactor fast division insertion logic (NFC)
rL296828: [BypassSlowDivision] Refactor fast division insertion logic (NFC)

Summary

The most important goal of the patch is to break large insertFastDiv function into separate pieces, so that later a different fast insertion logic can be implemented using some of these pieces.

Diff Detail

Event Timeline

n.bozhenov created this revision.Feb 13 2017, 9:49 AM

n.bozhenov mentioned this in D29897: [BypassSlowDivision] Use ValueTracking to simplify run-time checks.Feb 13 2017, 9:55 AM

n.bozhenov added a child revision: D29897: [BypassSlowDivision] Use ValueTracking to simplify run-time checks.

n.bozhenov mentioned this in D28199: [BypassSlowDivision] Use ValueTracking to simplify run-time checks.Feb 13 2017, 10:01 AM

jlebar added inline comments.Feb 13 2017, 4:26 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
139	Can we add a "// anonymous namespace" comment?
346	I'm totally behind the notion of creating classes that encapsulate our temporary state. But I'm not wild about the way we use FastDivInsertionTask as basically a bucket of mutable state. We have had really nasty bugs in LLVM with similar designs where we forget to reset one piece of the mutable state. (I think Chandler was saying that some pass effectively had a global cost cap instead of a per-function cost cap, because they just forgot to set the cost to 0 at the end of the function.) I went through a few iterations, and the one that seems best to me right now is to get rid of FastDivInserter, which is already very simple, and make FastDivInsertionTask a one-shot class. DivCacheTy Cache; while (Next != nullptr) { Instruction I = Next; Next = Next->getNextNode(); if (Value Replacement = FastDivInsertionTask(I, Cache, BypassWidths).getReplacement()) { I->RAUW(Replacement); MadeChange \|= true; } } WDYT?

jlebar added inline comments.Feb 13 2017, 4:29 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
80	This is actually SlowInstr or InstrToReplace or SlowDivOrRem -- we should be careful here and elsewhere not to call things divs that may be rems.
92	I would feel a lot more comfortable if our constructor initialized all of these variables. Otherwise it seems like we're asking for trouble.
98	"Runtime" as in "runtime check" is usually one word.

n.bozhenov added inline comments.Feb 14 2017, 1:51 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
346	Hi Justin, Thanks for reviewing the code so quickly. And I really like the code sample you suggested. If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it? If this is the case, then moving the code from FastDivInserter into FastDivInsertionTask constructor is indeed one of the possible solutions. Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside. Something like this: // In bypassSlowDivision: FastDivInserter FDI; while (Next != nullptr) { Instruction I = Next; Next = Next->getNextNode(); MadeChange \|= FDI.tryReplaceSlowDiv(I); } // In FastDivInserter::tryReplaceSlowDiv: FastDivInsertionTask Task(I, BypassWidths); if (!Task.isSlowDivision()) return false; Value R = Task.getReplacement(Cache); if (!R) return false; I->RAUW(R); return true; Currently I'm playing with these two approaches and will come with an updated patch by tomorrow. Please, comment if you're strongly in favour of one of these approaches.

jlebar added inline comments.Feb 14 2017, 1:56 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
346	If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it? Yes. I'm also a little concerned about the mutable state inside of the Task object used for passing between functions, but I wanted to get this part figured out first. Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside Right. If FastDivInserter has nontrivial complexity that it doesn't make sense to push into FastDivInsertionTask or the outer pass function body, this might make sense. When I looked at it, I didn't think it did.

n.bozhenov updated this revision to Diff 88552.Feb 15 2017, 9:08 AM

n.bozhenov added inline comments.Feb 15 2017, 9:12 AM

lib/Transforms/Utils/BypassSlowDivision.cpp
346	I have updated the patch. I have got rid of FastDivInserter class. And the Task is not reused any more, but a new object is constructed for each instruction. As for mutable fields used for passing data between functions, it's possible to get rid of them as well. We could define struct IncomingDivRemPair { BasicBlock BB; //< PHINode predecessor for the following values. Value Quotient; Value *Remainder; }; and return such a structure by value from createSlowBB and createFastBB methods. Later, we could pass a pair of such structures into createDivRemPhiNodes to make the latter both more flexible and easier to understand. The only reason I haven't done this yet is that I don't like very much returning structures by value. Do you think it is worth doing?

n.bozhenov marked 3 inline comments as done.Feb 15 2017, 9:20 AM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
80	During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder. To make the naming somewhat less confusing I suggest using the full word (Division) to refer to both Div and Rem operations, and use the abbreviations (Div/Rem) to refer to specific operation kinds. According to that, SlowDiv should be renamed into SlowDivision, FastDivInsertionTask into FastDivisionInsertionTask, and isDivisionOp into isDivOp. What do you think?

jlebar added inline comments.Feb 15 2017, 10:18 AM

lib/Transforms/Utils/BypassSlowDivision.cpp
80	During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder. I agree that renaming the file and pass would be overkill. But I think we can still improve readability by being just a little less strict about the suggestion. Even if we had a comment explaining that "division" == "div or mod" at the top of the file, that would be us inventing a new naming idiom, which is going to add to the cognitive cost to users of understanding our code. And realistically speaking, a large fraction of people aren't going to notice this comment at all, in which case they're just going to be lost. In contrast, `SlowDivOrRem` is the same number of characters as `SlowDivision`, but is unambiguous. I'd also be OK with `SlowInstr` if you prefer that. Again I don't think we need to be totally strict about the rule. I don't think that `FastDivInsertionTask` is ambiguous, for example, although perhaps a better name is FastDivRemInsertionTask would more closely match what it does (it always inserts both instrs). But the `SlowDiv` member here is, I think, ambiguous, because some of the members here are necessarily divs. Similarly, I think `createDivRuntimeCheck` would be less ambiguous as `insertOperandRuntimeCheck` or maybe something like `insertFastSlowBranch`, and `insertFastDiv` would be better named `insertFastDivAndRem`.
80	Do we really need this field, since it's just SlowDiv->getOpcode()?
82	Similarly, do we need this field, since it can be derived from SlowDiv? You could have a member function for it if it's annoying to get N separate times. I think a member function would be better because it makes it explicit that this is SlowDiv->getType().
84	Similarly for the MainBB field.
86	Similarly for the Dividend and Divisor fields.
98	FastBB and SlowBB, to match the functions right above?
118	Can some of these be made private and/or deleted now?
126	Consider using inline initializers for these, otherwise it's easy to add a member variable and forget to initialize it. class Foo { void* ptr = nullptr; };
128	Can we do this in the initialization list?
157–159	s/Cache/cache/
159	Perhaps // Skip division on vector types; only optimize integer instructions. auto* SlowType = dyn_cast<IntegerType>(I->getType()); if (!SlowType) return; auto BI = BypassWidths.find(SlowType->getBitWidth()); (Make SlowType a local variable, use dyn_cast instead of checking twice, and use auto for the iterator type. Also get rid of the comma splice in the comment and add a period.)
161	Nit, please add periods at the end of sentences.
168	slow div and rem operations
179	previously-computed
187	fast div and rem operations
190	auto
194	It seems like much less of a layering violation to me if insertFastDiv (possibly renamed to `createFastDivOrRem`) would return Optional<std::pair<Value, Value>>. Then we would update the cache here.
197–224	While we're here, probably should add missing word: "because this optimization only handles positive numbers"
248	"Creates a runtime check to test whether both the divisor and dividend fit into BypassType"?
250	Nit, a linebreak without any intervening whitespace like we have here isn't a meaningful punctuation. Please flow as a single paragraph, or split into two paragraphs by inserting a blank line.
324	This sentence runs on. We could remove "and the longer-slower div/rem instruction otherwise." Otherwise, can you split into two sentences?
337	Maybe "Split the basic block before the div/rem."
339	This line could use a comment, I think.
343–346	Perhaps we should update this comment.
346	Oh yes, I like this much better. The only reason I haven't done this yet is that I don't like very much returning structures by value. If it makes you feel any better, the calling convention returns large structs "by outparam". :)
363	I know I wrote it this way initially, but now that I see it...`MadeChange = true`? :)

n.bozhenov updated this revision to Diff 88939.Feb 17 2017, 12:08 PM

n.bozhenov marked 24 inline comments as done.Feb 17 2017, 12:18 PM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
84	Things get more difficult with MainBB. After splitting the basic block it is not obvious any more what basic block SlowDivOrRem->getParent() will return. So, I believe it's better to keep MainBB member in the class.
194	Right. I have moved all the caching logic into this routine. As a result, I could get rid of Cache field in the class. Now it's only passed here as a method argument.
343–346	Not sure what's wrong with the comment.

Sorry to keep going back and forth on this. I'm almost happy, but still have some meaty comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
78	I think we could still be more explicit about which variables have to do with division specifically and which have to do with div/rem. Right now it's confusing that IsSlowDivision actually means "should we try to generate a fast div/rem for SlowDivOrRem?". Similarly it's confusing that `isSignedDiv` means "is SlowDivOrRem a signed div or rem?". Maybe rename "IsSlowDivision" to "IsValid" or "IsValidTask"? And `isSignedDiv` to `isSignedOp` to match `isDivisionOp`?
84	sgtm. It's a lot more clear now.
90	isSignedOp()?
165	Nit, perhaps clearer as a ternary: return isDivisionOp() ? Value.Quotient : Value.Remainder;
170	"IncomingDivRemPair" is kind a confusing name, because the values are "outgoing" from one block and "incoming" to another. It's also subtly different from DivRemResult. It's going to be true that at least one of the Value*s inside our IncomingDivRemPair is an Instruction, right? If so, maybe we should Rename DivRemResult to DivRemPair Get rid of IncomingDivRemPair Add a DivRemPair::basicBlock() method that gets the BB from its Values (assuming that at least one is an Instruction).
187	It's kind of confusing that DivRemResult is "nullable". While the result of insertFastDivAndRem is nullable, the value inside of the cache must not be null. Seems like it would be clearer if we made DivRemResult not nullable and then returned an optional<DivRemResult> from insertFastDivAndRem.
194	Perhaps just CacheI = Cache.Insert({Key, DivRem}).first ? Not sure we need the comment if you write it like this, either.
343–346	I could have sworn it used to say "identifies division instructions", but now it says "DIV/REM" instructions, which is all I wanted. We're good here.

n.bozhenov updated this revision to Diff 89065.Feb 19 2017, 7:56 AM

n.bozhenov marked an inline comment as done.

n.bozhenov marked 5 inline comments as done.Feb 19 2017, 8:03 AM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
170	I do agree that DivRemResult and IncomingDivRemPair have something in common and it's tempting to get rid of one of them. On the other hand, they serve different purposes and have no complex logic to share. As a result, I believe these two structures should be kept separate one from another. Not sure about if their names are perfect, though. DivRemResult is a type that is used to cache results of bypassing. Usually, it is a pair of PHINodes. However, in D29897 there is a special case when it is a pair of ZExt instructions. IncomingDivRemPair structures are used to create two PHINodes in createDivRemPhiNodes. Each of the IncomingDivRemPair arguments describes one of the incoming values for the PHINodes. Again, in D29897 there is a special case when Quotient = Zero and Remainder = Dividend. In this case it would be impossible to derive information about PHINode predecessor from DivRemPair. So, basic block information should be explicitely included somehow into createDivRemPhiNodes arguments. Moreover, DivOpInfo and DivRemResult are auxilliary types for caching only. They contain only information significant for caching and it is extremely unlikely that any of these types will be extended in future. On the other hand, one may want to pass some additional information to createDivRemPhiNodes in future (e.g. some sort of metadata) and add additional fields to IncomingDivRemPair. It is not a problem only if DivRemResult and IncomingDivRemPair are two different types.

jlebar added inline comments.Feb 19 2017, 10:33 AM

lib/Transforms/Utils/BypassSlowDivision.cpp
47	Now we can remove this constructor, right?
170	Okay, how about we just name them based on what they contain, `QuotRemPair` and `QuotRemWithBB`?
192	It's not really a "div rem" -- it's the result of the div and rem, which is a quotient plus a remainder. Maybe just QuotRem or Result?
334	`return None` here and elsewhere.

n.bozhenov updated this revision to Diff 89133.Feb 20 2017, 10:29 AM

n.bozhenov marked 4 inline comments as done.Feb 20 2017, 10:36 AM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
47	Yep.
170	Definitely, such names have the advantage of being absolutely unambiguous.
192	Makes sense.
334	Right. Thanks.

lgtm with comments updated. \o/

lib/Transforms/Utils/BypassSlowDivision.cpp
42	Please update comment. This is now just a pair, it's not the result of bypassing.
46	Please update comment. This is now just two values plus a BB. If you want, you can say what the BB is supposed to represent. We should also say how this should be used, rather than describing how it is currently used in the algorithm. Perhaps: A quotient and remainder, plus a BB from which they logically "originate". If you use Quotient or Remainder in a Phi node, you should use BB as its corresponding predecessor.

This revision is now accepted and ready to land.Feb 20 2017, 10:53 AM

In D29896#681722, @jlebar wrote:

lgtm with comments updated. \o/

Thank you very much for reviewing the patch and very helpful suggestions.

Closed by commit rL296828: [BypassSlowDivision] Refactor fast division insertion logic (NFC) (authored by n.bozhenov). · Explain WhyMar 2 2017, 2:17 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

Transforms/

Utils/

BypassSlowDivision.cpp

342 lines

Diff 88213

lib/Transforms/Utils/BypassSlowDivision.cpp

Show All 33 Lines	struct DivOpInfo {

DivOpInfo(bool InSignedOp, Value InDividend, Value InDivisor)		DivOpInfo(bool InSignedOp, Value InDividend, Value InDivisor)
: SignedOp(InSignedOp), Dividend(InDividend), Divisor(InDivisor) {}		: SignedOp(InSignedOp), Dividend(InDividend), Divisor(InDivisor) {}
};		};

struct DivPhiNodes {		struct DivPhiNodes {
PHINode *Quotient;		PHINode *Quotient;
PHINode *Remainder;		PHINode *Remainder;

		jlebarUnsubmitted Not Done Reply Inline Actions Please update comment. This is now just a pair, it's not the result of bypassing. jlebar: Please update comment. This is now just a pair, it's not the result of bypassing.
DivPhiNodes(PHINode InQuotient, PHINode InRemainder)		DivPhiNodes(PHINode InQuotient, PHINode InRemainder)
: Quotient(InQuotient), Remainder(InRemainder) {}		: Quotient(InQuotient), Remainder(InRemainder) {}
};		};
}		}
		jlebarUnsubmitted Not Done Reply Inline Actions Please update comment. This is now just two values plus a BB. If you want, you can say what the BB is supposed to represent. We should also say how this should be used, rather than describing how it is currently used in the algorithm. Perhaps: A quotient and remainder, plus a BB from which they logically "originate". If you use Quotient or Remainder in a Phi node, you should use BB as its corresponding predecessor. jlebar: Please update comment. This is now just two values plus a BB. If you want, you can say what…

		jlebarUnsubmitted Done Reply Inline Actions Now we can remove this constructor, right? jlebar: Now we can remove this constructor, right?
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Yep. n.bozhenov: Yep.
namespace llvm {		namespace llvm {
template<>		template<>
struct DenseMapInfo<DivOpInfo> {		struct DenseMapInfo<DivOpInfo> {
static bool isEqual(const DivOpInfo &Val1, const DivOpInfo &Val2) {		static bool isEqual(const DivOpInfo &Val1, const DivOpInfo &Val2) {
return Val1.SignedOp == Val2.SignedOp &&		return Val1.SignedOp == Val2.SignedOp &&
Val1.Dividend == Val2.Dividend &&		Val1.Dividend == Val2.Dividend &&
Val1.Divisor == Val2.Divisor;		Val1.Divisor == Val2.Divisor;
}		}
Show All 9 Lines	struct DenseMapInfo<DivOpInfo> {
static unsigned getHashValue(const DivOpInfo &Val) {		static unsigned getHashValue(const DivOpInfo &Val) {
return (unsigned)(reinterpret_cast<uintptr_t>(Val.Dividend) ^		return (unsigned)(reinterpret_cast<uintptr_t>(Val.Dividend) ^
reinterpret_cast<uintptr_t>(Val.Divisor)) ^		reinterpret_cast<uintptr_t>(Val.Divisor)) ^
(unsigned)Val.SignedOp;		(unsigned)Val.SignedOp;
}		}
};		};

typedef DenseMap<DivOpInfo, DivPhiNodes> DivCacheTy;		typedef DenseMap<DivOpInfo, DivPhiNodes> DivCacheTy;
		typedef DenseMap<unsigned, unsigned> BypassWidthsTy;
}		}

// insertFastDiv - Substitutes the div/rem instruction with code that checks the		namespace {
// value of the operands and uses a shorter-faster div/rem instruction when		class FastDivInsertionTask {
// possible and the longer-slower div/rem instruction otherwise.		// These fields are set during initialization.
		jlebarUnsubmitted Done Reply Inline Actions I think we could still be more explicit about which variables have to do with division specifically and which have to do with div/rem. Right now it's confusing that IsSlowDivision actually means "should we try to generate a fast div/rem for SlowDivOrRem?". Similarly it's confusing that `isSignedDiv` means "is SlowDivOrRem a signed div or rem?". Maybe rename "IsSlowDivision" to "IsValid" or "IsValidTask"? And `isSignedDiv` to `isSignedOp` to match `isDivisionOp`? jlebar: I think we could still be more explicit about which variables have to do with division…
static bool insertFastDiv(Instruction I, IntegerType BypassType,		unsigned Opcode;
bool UseDivOp, bool UseSignedOp,		Instruction *SlowDiv;
		jlebarUnsubmitted Not Done Reply Inline Actions This is actually SlowInstr or InstrToReplace or SlowDivOrRem -- we should be careful here and elsewhere not to call things divs that may be rems. jlebar: This is actually SlowInstr or InstrToReplace or SlowDivOrRem -- we should be careful here and…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder. To make the naming somewhat less confusing I suggest using the full word (Division) to refer to both Div and Rem operations, and use the abbreviations (Div/Rem) to refer to specific operation kinds. According to that, SlowDiv should be renamed into SlowDivision, FastDivInsertionTask into FastDivisionInsertionTask, and isDivisionOp into isDivOp. What do you think? n.bozhenov: During this optimization we generally use 'slow division' to refer to both Div and Rem…
		jlebarUnsubmitted Done Reply Inline Actions During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder. I agree that renaming the file and pass would be overkill. But I think we can still improve readability by being just a little less strict about the suggestion. Even if we had a comment explaining that "division" == "div or mod" at the top of the file, that would be us inventing a new naming idiom, which is going to add to the cognitive cost to users of understanding our code. And realistically speaking, a large fraction of people aren't going to notice this comment at all, in which case they're just going to be lost. In contrast, `SlowDivOrRem` is the same number of characters as `SlowDivision`, but is unambiguous. I'd also be OK with `SlowInstr` if you prefer that. Again I don't think we need to be totally strict about the rule. I don't think that `FastDivInsertionTask` is ambiguous, for example, although perhaps a better name is FastDivRemInsertionTask would more closely match what it does (it always inserts both instrs). But the `SlowDiv` member here is, I think, ambiguous, because some of the members here are necessarily divs. Similarly, I think `createDivRuntimeCheck` would be less ambiguous as `insertOperandRuntimeCheck` or maybe something like `insertFastSlowBranch`, and `insertFastDiv` would be better named `insertFastDivAndRem`. jlebar: > During this optimization we generally use 'slow division' to refer to both Div and Rem…
		jlebarUnsubmitted Done Reply Inline Actions Do we really need this field, since it's just SlowDiv->getOpcode()? jlebar: Do we really need this field, since it's just SlowDiv->getOpcode()?
DivCacheTy &PerBBDivCache) {		IntegerType *SlowType;
Function *F = I->getParent()->getParent();		IntegerType *BypassType;
		jlebarUnsubmitted Done Reply Inline Actions Similarly, do we need this field, since it can be derived from SlowDiv? You could have a member function for it if it's annoying to get N separate times. I think a member function would be better because it makes it explicit that this is SlowDiv->getType(). jlebar: Similarly, do we need this field, since it can be derived from SlowDiv? You could have a…
// Get instruction operands		BasicBlock *MainBB;
Value *Dividend = I->getOperand(0);		Value *Dividend;
		jlebarUnsubmitted Not Done Reply Inline Actions Similarly for the MainBB field. jlebar: Similarly for the MainBB field.
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Things get more difficult with MainBB. After splitting the basic block it is not obvious any more what basic block SlowDivOrRem->getParent() will return. So, I believe it's better to keep MainBB member in the class. n.bozhenov: Things get more difficult with MainBB. After splitting the basic block it is not obvious any…
		jlebarUnsubmitted Not Done Reply Inline Actions sgtm. It's a lot more clear now. jlebar: sgtm. It's a lot more clear now.
Value *Divisor = I->getOperand(1);		Value *Divisor;

		jlebarUnsubmitted Done Reply Inline Actions Similarly for the Dividend and Divisor fields. jlebar: Similarly for the Dividend and Divisor fields.
if (isa<ConstantInt>(Divisor)) {		// These fields are used to pass intermediate results through private
// Division by a constant should have been been solved and replaced earlier		// methods.
// in the pipeline.		Value *ShortQuotientV;
		Value *ShortRemainderV;
		jlebarUnsubmitted Done Reply Inline Actions isSignedOp()? jlebar: isSignedOp()?
		Value *LongQuotientV;
		Value *LongRemainderV;
		jlebarUnsubmitted Done Reply Inline Actions I would feel a lot more comfortable if our constructor initialized all of these variables. Otherwise it seems like we're asking for trouble. jlebar: I would feel a lot more comfortable if our constructor initialized all of these variables.

		BasicBlock createSlowBB(BasicBlock Successor);
		BasicBlock createFastBB(BasicBlock Successor);
		void createDivRemPhiNodes(BasicBlock ShortBB, BasicBlock LongBB,
		BasicBlock *PhiBB, DivCacheTy &PerBBDivCache);
		Value *createDivRunTimeCheck();
		jlebarUnsubmitted Done Reply Inline Actions "Runtime" as in "runtime check" is usually one word. jlebar: "Runtime" as in "runtime check" is usually one word.
		jlebarUnsubmitted Done Reply Inline Actions FastBB and SlowBB, to match the functions right above? jlebar: FastBB and SlowBB, to match the functions right above?

		public:
		/// Sets up the object to work with instruction \p I. Returns false if the
		/// instruction is not a scalar integer division operation.
		bool setScalarIDivOp(Instruction *I);
		void setBypassType(IntegerType *BT) { BypassType = BT; }

		Instruction *getSlowDiv() { return SlowDiv; }
		IntegerType *getSlowType() { return SlowType; }
		Value *getDividend() { return Dividend; }
		Value *getDivisor() { return Divisor; }

		bool isSignedDiv() {
		return Opcode == Instruction::SDiv \|\| Opcode == Instruction::SRem;
		}
		bool isDivisionOp() {
		return Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv;
		}

		/// Tries to replace a division with a faster code and returns true if
		jlebarUnsubmitted Done Reply Inline Actions Can some of these be made private and/or deleted now? jlebar: Can some of these be made private and/or deleted now?
		/// succeeds.
		bool insertFastDiv(DivCacheTy &PerBBDivCache);
		};

		class FastDivInserter {
		const BypassWidthsTy &BypassWidths;
		DivCacheTy PerBBDivCache;

		jlebarUnsubmitted Done Reply Inline Actions Consider using inline initializers for these, otherwise it's easy to add a member variable and forget to initialize it. class Foo { void* ptr = nullptr; }; jlebar: Consider using inline initializers for these, otherwise it's easy to add a member variable and…
		public:
		FastDivInserter(const BypassWidthsTy &BW) : BypassWidths(BW) {}
		jlebarUnsubmitted Done Reply Inline Actions Can we do this in the initialization list? jlebar: Can we do this in the initialization list?
		DivCacheTy &getDivCache() { return PerBBDivCache; }

		/// Checks that \p I is indeed a slow division and initializes \p Task
		/// properly. In case of success returns true.
		bool isSlowDiv(Instruction *I, FastDivInsertionTask &Task);

		/// Tries to replace a slow division with either existing operations or with
		/// new ones. Returns true if succeeds.
		bool reuseOrInsertFastDiv(FastDivInsertionTask &Task);
		};
		}
		jlebarUnsubmitted Done Reply Inline Actions Can we add a "// anonymous namespace" comment? jlebar: Can we add a "// anonymous namespace" comment?

		bool FastDivInsertionTask::setScalarIDivOp(Instruction *I) {
		SlowDiv = I;
		Opcode = I->getOpcode();
		BypassType = nullptr; // To make sure BypassType is set before using.

		// Only optimize div or rem ops
		switch (Opcode) {
		case Instruction::UDiv:
		case Instruction::SDiv:
		case Instruction::URem:
		case Instruction::SRem:
		break;
		default:
return false;		return false;
}		}

// If the numerator is a constant, bail if it doesn't fit into BypassType.		// Skip division on vector types, only optimize integer instructions
if (ConstantInt *ConstDividend = dyn_cast<ConstantInt>(Dividend))		if (!I->getType()->isIntegerTy())
if (ConstDividend->getValue().getActiveBits() > BypassType->getBitWidth())
return false;		return false;
		jlebarUnsubmitted Done Reply Inline Actions Perhaps // Skip division on vector types; only optimize integer instructions. auto* SlowType = dyn_cast<IntegerType>(I->getType()); if (!SlowType) return; auto BI = BypassWidths.find(SlowType->getBitWidth()); (Make SlowType a local variable, use dyn_cast instead of checking twice, and use auto for the iterator type. Also get rid of the comma splice in the comment and add a period.) jlebar: Perhaps // Skip division on vector types; only optimize integer instructions. auto*…
		jlebarUnsubmitted Done Reply Inline Actions s/Cache/cache/ jlebar: s/Cache/cache/

// Basic Block is split before divide		SlowType = cast<IntegerType>(I->getType());
		jlebarUnsubmitted Done Reply Inline Actions Nit, please add periods at the end of sentences. jlebar: Nit, please add periods at the end of sentences.
BasicBlock MainBB = &I->getParent();		MainBB = I->getParent();
BasicBlock *SuccessorBB = MainBB->splitBasicBlock(I);		Dividend = SlowDiv->getOperand(0);
		Divisor = SlowDiv->getOperand(1);

		jlebarUnsubmitted Done Reply Inline Actions Nit, perhaps clearer as a ternary: return isDivisionOp() ? Value.Quotient : Value.Remainder; jlebar: Nit, perhaps clearer as a ternary: return isDivisionOp() ? Value.Quotient : Value.Remainder;
		return true;
		}

		jlebarUnsubmitted Done Reply Inline Actions slow div and rem operations jlebar: slow div and rem operations
// Add new basic block for slow divide operation		/// Add new basic block for slow divide operation and put it before SuccessorBB.
BasicBlock *SlowBB =		BasicBlock FastDivInsertionTask::createSlowBB(BasicBlock SuccessorBB) {
		jlebarUnsubmitted Not Done Reply Inline Actions "IncomingDivRemPair" is kind a confusing name, because the values are "outgoing" from one block and "incoming" to another. It's also subtly different from DivRemResult. It's going to be true that at least one of the Values inside our IncomingDivRemPair is an Instruction, right? If so, maybe we should Rename DivRemResult to DivRemPair Get rid of IncomingDivRemPair Add a DivRemPair::basicBlock() method that gets the BB from its Values (assuming that at least one is an Instruction). jlebar:* "IncomingDivRemPair" is kind a confusing name, because the values are "outgoing" from one block…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions I do agree that DivRemResult and IncomingDivRemPair have something in common and it's tempting to get rid of one of them. On the other hand, they serve different purposes and have no complex logic to share. As a result, I believe these two structures should be kept separate one from another. Not sure about if their names are perfect, though. DivRemResult is a type that is used to cache results of bypassing. Usually, it is a pair of PHINodes. However, in D29897 there is a special case when it is a pair of ZExt instructions. IncomingDivRemPair structures are used to create two PHINodes in createDivRemPhiNodes. Each of the IncomingDivRemPair arguments describes one of the incoming values for the PHINodes. Again, in D29897 there is a special case when Quotient = Zero and Remainder = Dividend. In this case it would be impossible to derive information about PHINode predecessor from DivRemPair. So, basic block information should be explicitely included somehow into createDivRemPhiNodes arguments. Moreover, DivOpInfo and DivRemResult are auxilliary types for caching only. They contain only information significant for caching and it is extremely unlikely that any of these types will be extended in future. On the other hand, one may want to pass some additional information to createDivRemPhiNodes in future (e.g. some sort of metadata) and add additional fields to IncomingDivRemPair. It is not a problem only if DivRemResult and IncomingDivRemPair are two different types. n.bozhenov: I do agree that DivRemResult and IncomingDivRemPair have something in common and it's tempting…
		jlebarUnsubmitted Done Reply Inline Actions Okay, how about we just name them based on what they contain, `QuotRemPair` and `QuotRemWithBB`? jlebar: Okay, how about we just name them based on what they contain, `QuotRemPair` and `QuotRemWithBB`?
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Definitely, such names have the advantage of being absolutely unambiguous. n.bozhenov: Definitely, such names have the advantage of being absolutely unambiguous.
BasicBlock::Create(F->getContext(), "", MainBB->getParent(), SuccessorBB);		BasicBlock *SlowBB = BasicBlock::Create(MainBB->getParent()->getContext(), "",
SlowBB->moveBefore(SuccessorBB);		MainBB->getParent(), SuccessorBB);
IRBuilder<> SlowBuilder(SlowBB, SlowBB->begin());		IRBuilder<> Builder(SlowBB, SlowBB->begin());
Value *SlowQuotientV;
Value *SlowRemainderV;		if (isSignedDiv()) {
if (UseSignedOp) {		LongQuotientV = Builder.CreateSDiv(Dividend, Divisor);
SlowQuotientV = SlowBuilder.CreateSDiv(Dividend, Divisor);		LongRemainderV = Builder.CreateSRem(Dividend, Divisor);
SlowRemainderV = SlowBuilder.CreateSRem(Dividend, Divisor);
} else {		} else {
SlowQuotientV = SlowBuilder.CreateUDiv(Dividend, Divisor);		LongQuotientV = Builder.CreateUDiv(Dividend, Divisor);
		jlebarUnsubmitted Done Reply Inline Actions previously-computed jlebar: previously-computed
SlowRemainderV = SlowBuilder.CreateURem(Dividend, Divisor);		LongRemainderV = Builder.CreateURem(Dividend, Divisor);
}		}
SlowBuilder.CreateBr(SuccessorBB);

// Add new basic block for fast divide operation		Builder.CreateBr(SuccessorBB);
BasicBlock *FastBB =		return SlowBB;
BasicBlock::Create(F->getContext(), "", MainBB->getParent(), SuccessorBB);		}
FastBB->moveBefore(SlowBB);
IRBuilder<> FastBuilder(FastBB, FastBB->begin());		/// Add new basic block for fast divide operation and put it before SuccessorBB.
		jlebarUnsubmitted Done Reply Inline Actions fast div and rem operations jlebar: fast div and rem operations
		jlebarUnsubmitted Done Reply Inline Actions It's kind of confusing that DivRemResult is "nullable". While the result of insertFastDivAndRem is nullable, the value inside of the cache must not be null. Seems like it would be clearer if we made DivRemResult not nullable and then returned an optional<DivRemResult> from insertFastDivAndRem. jlebar: It's kind of confusing that DivRemResult is "nullable". While the result of…
Value *ShortDivisorV = FastBuilder.CreateCast(Instruction::Trunc, Divisor,		BasicBlock FastDivInsertionTask::createFastBB(BasicBlock SuccessorBB) {
BypassType);		BasicBlock *FastBB = BasicBlock::Create(MainBB->getParent()->getContext(), "",
Value *ShortDividendV = FastBuilder.CreateCast(Instruction::Trunc, Dividend,		MainBB->getParent(), SuccessorBB);
		jlebarUnsubmitted Done Reply Inline Actions auto jlebar: auto
BypassType);		IRBuilder<> Builder(FastBB, FastBB->begin());
		Value *ShortDivisorV =
		jlebarUnsubmitted Done Reply Inline Actions It's not really a "div rem" -- it's the result of the div and rem, which is a quotient plus a remainder. Maybe just QuotRem or Result? jlebar: It's not really a "div rem" -- it's the result of the div and rem, which is a quotient plus a…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Makes sense. n.bozhenov: Makes sense.
		Builder.CreateCast(Instruction::Trunc, Divisor, BypassType);
		Value *ShortDividendV =
		jlebarUnsubmitted Done Reply Inline Actions It seems like much less of a layering violation to me if insertFastDiv (possibly renamed to `createFastDivOrRem`) would return Optional<std::pair<Value, Value>>. Then we would update the cache here. jlebar: It seems like much less of a layering violation to me if insertFastDiv (possibly renamed to…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Right. I have moved all the caching logic into this routine. As a result, I could get rid of Cache field in the class. Now it's only passed here as a method argument. n.bozhenov: Right. I have moved all the caching logic into this routine. As a result, I could get rid of…
		jlebarUnsubmitted Done Reply Inline Actions Perhaps just CacheI = Cache.Insert({Key, DivRem}).first ? Not sure we need the comment if you write it like this, either. jlebar: Perhaps just CacheI = Cache.Insert({Key, DivRem}).first ? Not sure we need the comment if…
		Builder.CreateCast(Instruction::Trunc, Dividend, BypassType);

// udiv/urem because optimization only handles positive numbers		// udiv/urem because optimization only handles positive numbers
Value *ShortQuotientV = FastBuilder.CreateUDiv(ShortDividendV, ShortDivisorV);		Value *ShortQV = Builder.CreateUDiv(ShortDividendV, ShortDivisorV);
Value *ShortRemainderV = FastBuilder.CreateURem(ShortDividendV,		Value *ShortRV = Builder.CreateURem(ShortDividendV, ShortDivisorV);
ShortDivisorV);		ShortQuotientV = Builder.CreateCast(Instruction::ZExt, ShortQV, SlowType);
Value *FastQuotientV = FastBuilder.CreateCast(Instruction::ZExt,		ShortRemainderV = Builder.CreateCast(Instruction::ZExt, ShortRV, SlowType);
ShortQuotientV,		Builder.CreateBr(SuccessorBB);
Dividend->getType());
Value *FastRemainderV = FastBuilder.CreateCast(Instruction::ZExt,		return FastBB;
ShortRemainderV,		}
Dividend->getType());
FastBuilder.CreateBr(SuccessorBB);		// Create Phi nodes for result of Div and Rem.
		void FastDivInsertionTask::createDivRemPhiNodes(BasicBlock *ShortBB,
// Phi nodes for result of div and rem		BasicBlock *LongBB,
IRBuilder<> SuccessorBuilder(SuccessorBB, SuccessorBB->begin());		BasicBlock *PhiBB,
PHINode *QuoPhi = SuccessorBuilder.CreatePHI(I->getType(), 2);		DivCacheTy &PerBBDivCache) {
QuoPhi->addIncoming(SlowQuotientV, SlowBB);		IRBuilder<> Builder(PhiBB, PhiBB->begin());
QuoPhi->addIncoming(FastQuotientV, FastBB);		PHINode *QuoPhi = Builder.CreatePHI(SlowType, 2);
PHINode *RemPhi = SuccessorBuilder.CreatePHI(I->getType(), 2);		QuoPhi->addIncoming(LongQuotientV, LongBB);
RemPhi->addIncoming(SlowRemainderV, SlowBB);		QuoPhi->addIncoming(ShortQuotientV, ShortBB);
RemPhi->addIncoming(FastRemainderV, FastBB);		PHINode *RemPhi = Builder.CreatePHI(SlowType, 2);
		RemPhi->addIncoming(LongRemainderV, LongBB);
// Replace I with appropriate phi node		RemPhi->addIncoming(ShortRemainderV, ShortBB);
if (UseDivOp)
I->replaceAllUsesWith(QuoPhi);		// Replace SlowDiv with appropriate phi node
		if (isDivisionOp())
		SlowDiv->replaceAllUsesWith(QuoPhi);
else		else
I->replaceAllUsesWith(RemPhi);		SlowDiv->replaceAllUsesWith(RemPhi);
		jlebarUnsubmitted Done Reply Inline Actions While we're here, probably should add missing word: "because this optimization only handles positive numbers" jlebar: While we're here, probably should add missing word: "because this optimization only handles…
I->eraseFromParent();

// Combine operands into a single value with OR for value testing below		SlowDiv->eraseFromParent();
MainBB->getInstList().back().eraseFromParent();		SlowDiv = nullptr;
IRBuilder<> MainBuilder(MainBB, MainBB->end());
		// Cache phi nodes to be used later in place of other instances
		// of div or rem with the same sign, dividend, and divisor
		DivOpInfo Key(isSignedDiv(), Dividend, Divisor);
		DivPhiNodes Val(QuoPhi, RemPhi);
		PerBBDivCache.insert(std::pair<DivOpInfo, DivPhiNodes>(Key, Val));
		}

		// Creates a run-time check to test whether both operands fit the shorter type.
		// The check is inserted at the end of MainBB.
		// True return value means that the operands fit.
		Value *FastDivInsertionTask::createDivRunTimeCheck() {
		IRBuilder<> Builder(MainBB, MainBB->end());

// We should have bailed out above if the divisor is a constant, but the		// We should have bailed out above if the divisor is a constant, but the
// dividend may still be a constant. Set OrV to our non-constant operands		// dividend may still be a constant. Set OrV to our non-constant operands
// OR'ed together.		// OR'ed together.
assert(!isa<ConstantInt>(Divisor));		assert(!isa<ConstantInt>(Divisor));

Value *OrV;		Value *OrV;
if (!isa<ConstantInt>(Dividend))		if (!isa<ConstantInt>(Dividend))
		jlebarUnsubmitted Done Reply Inline Actions "Creates a runtime check to test whether both the divisor and dividend fit into BypassType"? jlebar: "Creates a runtime check to test whether both the divisor and dividend fit into BypassType"?
OrV = MainBuilder.CreateOr(Dividend, Divisor);		OrV = Builder.CreateOr(Dividend, Divisor);
else		else
		jlebarUnsubmitted Done Reply Inline Actions Nit, a linebreak without any intervening whitespace like we have here isn't a meaningful punctuation. Please flow as a single paragraph, or split into two paragraphs by inserting a blank line. jlebar: Nit, a linebreak without any intervening whitespace like we have here isn't a meaningful…
OrV = Divisor;		OrV = Divisor;

// BitMask is inverted to check if the operands are		// BitMask is inverted to check if the operands are
// larger than the bypass type		// larger than the bypass type
uint64_t BitMask = ~BypassType->getBitMask();		uint64_t BitMask = ~BypassType->getBitMask();
Value *AndV = MainBuilder.CreateAnd(OrV, BitMask);		Value *AndV = Builder.CreateAnd(OrV, BitMask);

// Compare operand values and branch		// Compare operand values
Value *ZeroV = ConstantInt::getSigned(Dividend->getType(), 0);		Value *ZeroV = ConstantInt::getSigned(SlowType, 0);
Value *CmpV = MainBuilder.CreateICmpEQ(AndV, ZeroV);		return Builder.CreateICmpEQ(AndV, ZeroV);
MainBuilder.CreateCondBr(CmpV, FastBB, SlowBB);		}

		// insertFastDiv - Substitutes the div/rem instruction with code that checks the
		// value of the operands and uses a shorter-faster div/rem instruction when
		// possible and the longer-slower div/rem instruction otherwise.
		bool FastDivInsertionTask::insertFastDiv(DivCacheTy &PerBBDivCache) {
		if (isa<ConstantInt>(Divisor)) {
		// Division by a constant should have been been solved and replaced earlier
		// in the pipeline.
		return false;
		}

		// If the numerator is a constant, bail if it doesn't fit into BypassType.
		if (ConstantInt *ConstDividend = dyn_cast<ConstantInt>(Dividend))
		if (ConstDividend->getValue().getActiveBits() > BypassType->getBitWidth())
		return false;

		// Basic Block is split before divide
		BasicBlock *SuccessorBB = MainBB->splitBasicBlock(SlowDiv);
		MainBB->getInstList().back().eraseFromParent();
		BasicBlock *FastBB = createFastBB(SuccessorBB);
		BasicBlock *SlowBB = createSlowBB(SuccessorBB);
		createDivRemPhiNodes(FastBB, SlowBB, SuccessorBB, PerBBDivCache);
		Value *CmpV = createDivRunTimeCheck();
		IRBuilder<> Builder(MainBB, MainBB->end());
		Builder.CreateCondBr(CmpV, FastBB, SlowBB);

		return true;
		}

		bool FastDivInserter::isSlowDiv(Instruction *I, FastDivInsertionTask &Task) {
		if (!Task.setScalarIDivOp(I))
		return false;

		unsigned int bitwidth = Task.getSlowType()->getBitWidth();

		// Skip if bitwidth is not bypassed
		BypassWidthsTy::const_iterator BI = BypassWidths.find(bitwidth);
		if (BI == BypassWidths.end())
		return false;

		// Get type for div/rem instruction with bypass bitwidth
		IntegerType *BT = IntegerType::get(I->getContext(), BI->second);
		Task.setBypassType(BT);

// Cache phi nodes to be used later in place of other instances
// of div or rem with the same sign, dividend, and divisor
DivOpInfo Key(UseSignedOp, Dividend, Divisor);
DivPhiNodes Value(QuoPhi, RemPhi);
PerBBDivCache.insert(std::pair<DivOpInfo, DivPhiNodes>(Key, Value));
return true;		return true;
}		}

// reuseOrInsertFastDiv - Reuses previously computed dividend or remainder from		// reuseOrInsertFastDiv - Reuses previously computed dividend or remainder from
// the current BB if operands and operation are identical. Otherwise calls		// the current BB if operands and operation are identical. Otherwise calls
// insertFastDiv to perform the optimization and caches the resulting dividend		// insertFastDiv to perform the optimization and caches the resulting dividend
// and remainder.		// and remainder.
static bool reuseOrInsertFastDiv(Instruction I, IntegerType BypassType,		bool FastDivInserter::reuseOrInsertFastDiv(FastDivInsertionTask &Task) {
bool UseDivOp, bool UseSignedOp,		Instruction *SlowDiv = Task.getSlowDiv();
DivCacheTy &PerBBDivCache) {
// Get instruction operands		// Get instruction operands
DivOpInfo Key(UseSignedOp, I->getOperand(0), I->getOperand(1));		DivOpInfo Key(Task.isSignedDiv(), Task.getDividend(), Task.getDivisor());
DivCacheTy::iterator CacheI = PerBBDivCache.find(Key);		DivCacheTy::iterator CacheI = PerBBDivCache.find(Key);

if (CacheI == PerBBDivCache.end()) {		if (CacheI == PerBBDivCache.end()) {
// If previous instance does not exist, insert fast div		// If previous instance does not exist, insert fast div
return insertFastDiv(I, BypassType, UseDivOp, UseSignedOp, PerBBDivCache);		return Task.insertFastDiv(PerBBDivCache);
}		}

		jlebarUnsubmitted Done Reply Inline Actions This sentence runs on. We could remove "and the longer-slower div/rem instruction otherwise." Otherwise, can you split into two sentences? jlebar: This sentence runs on. We could remove "and the longer-slower div/rem instruction otherwise."…
// Replace operation value with previously generated phi node		// Replace operation value with previously generated phi node
DivPhiNodes &Value = CacheI->second;		DivPhiNodes &Value = CacheI->second;
if (UseDivOp) {		if (Task.isDivisionOp()) {
// Replace all uses of div instruction with quotient phi node		// Replace all uses of div instruction with quotient phi node
I->replaceAllUsesWith(Value.Quotient);		SlowDiv->replaceAllUsesWith(Value.Quotient);
} else {		} else {
// Replace all uses of rem instruction with remainder phi node		// Replace all uses of rem instruction with remainder phi node
I->replaceAllUsesWith(Value.Remainder);		SlowDiv->replaceAllUsesWith(Value.Remainder);
}		}

		jlebarUnsubmitted Done Reply Inline Actions `return None` here and elsewhere. jlebar: `return None` here and elsewhere.
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Right. Thanks. n.bozhenov: Right. Thanks.
// Remove redundant operation		// Remove redundant operation
I->eraseFromParent();		SlowDiv->eraseFromParent();
return true;		return true;
		jlebarUnsubmitted Done Reply Inline Actions Maybe "Split the basic block before the div/rem." jlebar: Maybe "Split the basic block before the div/rem."
}		}

		jlebarUnsubmitted Done Reply Inline Actions This line could use a comment, I think. jlebar: This line could use a comment, I think.
// bypassSlowDivision - This optimization identifies DIV instructions in a BB		// bypassSlowDivision - This optimization identifies DIV instructions in a BB
// that can be profitably bypassed and carried out with a shorter, faster		// that can be profitably bypassed and carried out with a shorter, faster
// divide.		// divide.
bool llvm::bypassSlowDivision(		bool llvm::bypassSlowDivision(BasicBlock *BB,
BasicBlock *BB, const DenseMap<unsigned int, unsigned int> &BypassWidths) {		const BypassWidthsTy &BypassWidths) {
DivCacheTy DivCache;		FastDivInserter FDI(BypassWidths);
		FastDivInsertionTask Task;
		jlebarUnsubmitted Not Done Reply Inline Actions I'm totally behind the notion of creating classes that encapsulate our temporary state. But I'm not wild about the way we use FastDivInsertionTask as basically a bucket of mutable state. We have had really nasty bugs in LLVM with similar designs where we forget to reset one piece of the mutable state. (I think Chandler was saying that some pass effectively had a global cost cap instead of a per-function cost cap, because they just forgot to set the cost to 0 at the end of the function.) I went through a few iterations, and the one that seems best to me right now is to get rid of FastDivInserter, which is already very simple, and make FastDivInsertionTask a one-shot class. DivCacheTy Cache; while (Next != nullptr) { Instruction I = Next; Next = Next->getNextNode(); if (Value Replacement = FastDivInsertionTask(I, Cache, BypassWidths).getReplacement()) { I->RAUW(Replacement); MadeChange \|= true; } } WDYT? jlebar: I'm totally behind the notion of creating classes that encapsulate our temporary state. But…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Hi Justin, Thanks for reviewing the code so quickly. And I really like the code sample you suggested. If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it? If this is the case, then moving the code from FastDivInserter into FastDivInsertionTask constructor is indeed one of the possible solutions. Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside. Something like this: // In bypassSlowDivision: FastDivInserter FDI; while (Next != nullptr) { Instruction I = Next; Next = Next->getNextNode(); MadeChange \|= FDI.tryReplaceSlowDiv(I); } // In FastDivInserter::tryReplaceSlowDiv: FastDivInsertionTask Task(I, BypassWidths); if (!Task.isSlowDivision()) return false; Value R = Task.getReplacement(Cache); if (!R) return false; I->RAUW(R); return true; Currently I'm playing with these two approaches and will come with an updated patch by tomorrow. Please, comment if you're strongly in favour of one of these approaches. n.bozhenov: Hi Justin, Thanks for reviewing the code so quickly. And I really like the code sample you…
		jlebarUnsubmitted Not Done Reply Inline Actions If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it? Yes. I'm also a little concerned about the mutable state inside of the Task object used for passing between functions, but I wanted to get this part figured out first. Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside Right. If FastDivInserter has nontrivial complexity that it doesn't make sense to push into FastDivInsertionTask or the outer pass function body, this might make sense. When I looked at it, I didn't think it did. jlebar: > If I understand correctly, your main concern is that in my code the Task object gets reused…
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions I have updated the patch. I have got rid of FastDivInserter class. And the Task is not reused any more, but a new object is constructed for each instruction. As for mutable fields used for passing data between functions, it's possible to get rid of them as well. We could define struct IncomingDivRemPair { BasicBlock BB; //< PHINode predecessor for the following values. Value Quotient; Value Remainder; }; and return such a structure by value from createSlowBB and createFastBB methods. Later, we could pass a pair of such structures into createDivRemPhiNodes to make the latter both more flexible and easier to understand. The only reason I haven't done this yet is that I don't like very much returning structures by value. Do you think it is worth doing? n.bozhenov:* I have updated the patch. I have got rid of FastDivInserter class. And the Task is not reused…
		jlebarUnsubmitted Done Reply Inline Actions Oh yes, I like this much better. The only reason I haven't done this yet is that I don't like very much returning structures by value. If it makes you feel any better, the calling convention returns large structs "by outparam". :) jlebar: Oh yes, I like this much better. > The only reason I haven't done this yet is that I don't…
		jlebarUnsubmitted Not Done Reply Inline Actions Perhaps we should update this comment. jlebar: Perhaps we should update this comment.
		n.bozhenovAuthorUnsubmitted Not Done Reply Inline Actions Not sure what's wrong with the comment. n.bozhenov: Not sure what's wrong with the comment.
		jlebarUnsubmitted Not Done Reply Inline Actions I could have sworn it used to say "identifies division instructions", but now it says "DIV/REM" instructions, which is all I wanted. We're good here. jlebar: I could have sworn it used to say "identifies division instructions", but now it says "DIV/REM"…

bool MadeChange = false;		bool MadeChange = false;
Instruction* Next = &*BB->begin();		Instruction* Next = &*BB->begin();
while (Next != nullptr) {		while (Next != nullptr) {
// We may add instructions immediately after I, but we want to skip over		// We may add instructions immediately after I, but we want to skip over
// them.		// them.
Instruction* I = Next;		Instruction* I = Next;
Next = Next->getNextNode();		Next = Next->getNextNode();

// Get instruction details		if (!FDI.isSlowDiv(I, Task))
unsigned Opcode = I->getOpcode();
bool UseDivOp = Opcode == Instruction::SDiv \|\| Opcode == Instruction::UDiv;
bool UseRemOp = Opcode == Instruction::SRem \|\| Opcode == Instruction::URem;
bool UseSignedOp = Opcode == Instruction::SDiv \|\|
Opcode == Instruction::SRem;

// Only optimize div or rem ops
if (!UseDivOp && !UseRemOp)
continue;		continue;

// Skip division on vector types, only optimize integer instructions		MadeChange \|= FDI.reuseOrInsertFastDiv(Task);
if (!I->getType()->isIntegerTy())
continue;

// Get bitwidth of div/rem instruction
IntegerType *T = cast<IntegerType>(I->getType());
unsigned int bitwidth = T->getBitWidth();

// Continue if bitwidth is not bypassed
DenseMap<unsigned int, unsigned int>::const_iterator BI = BypassWidths.find(bitwidth);
if (BI == BypassWidths.end())
continue;

// Get type for div/rem instruction with bypass bitwidth
IntegerType *BT = IntegerType::get(I->getContext(), BI->second);

MadeChange \|= reuseOrInsertFastDiv(I, BT, UseDivOp, UseSignedOp, DivCache);
}		}

// Above we eagerly create divs and rems, as pairs, so that we can efficiently		// Above we eagerly create divs and rems, as pairs, so that we can efficiently
// create divrem machine instructions. Now erase any unused divs / rems so we		// create divrem machine instructions. Now erase any unused divs / rems so we
		jlebarUnsubmitted Done Reply Inline Actions I know I wrote it this way initially, but now that I see it...`MadeChange = true`? :) jlebar: I know I wrote it this way initially, but now that I see it...`MadeChange = true`? :)
// don't leave extra instructions sitting around.		// don't leave extra instructions sitting around.
for (auto &KV : DivCache)		for (auto &KV : FDI.getDivCache())
for (Instruction *Phi : {KV.second.Quotient, KV.second.Remainder})		for (Instruction *Phi : {KV.second.Quotient, KV.second.Remainder})
RecursivelyDeleteTriviallyDeadInstructions(Phi);		RecursivelyDeleteTriviallyDeadInstructions(Phi);

return MadeChange;		return MadeChange;
}		}

This is an archive of the discontinued LLVM Phabricator instance.

[BypassSlowDivision] Refactor fast division insertion logic (NFC)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 88213

lib/Transforms/Utils/BypassSlowDivision.cpp

[BypassSlowDivision] Refactor fast division insertion logic (NFC)
ClosedPublic