This is an archive of the discontinued LLVM Phabricator instance.

[BypassSlowDivision] Refactor fast division insertion logic (NFC)
ClosedPublic

Authored by n.bozhenov on Feb 13 2017, 9:49 AM.

Details

Summary

The most important goal of the patch is to break large insertFastDiv function into separate pieces, so that later a different fast insertion logic can be implemented using some of these pieces.

Diff Detail

Repository
rL LLVM

Event Timeline

n.bozhenov created this revision.Feb 13 2017, 9:49 AM
jlebar added inline comments.Feb 13 2017, 4:26 PM
lib/Transforms/Utils/BypassSlowDivision.cpp
139 ↗(On Diff #88213)

Can we add a "// anonymous namespace" comment?

346 ↗(On Diff #88213)

I'm totally behind the notion of creating classes that encapsulate our temporary state. But I'm not wild about the way we use FastDivInsertionTask as basically a bucket of *mutable* state. We have had really nasty bugs in LLVM with similar designs where we forget to reset one piece of the mutable state. (I think Chandler was saying that some pass effectively had a global cost cap instead of a per-function cost cap, because they just forgot to set the cost to 0 at the end of the function.)

I went through a few iterations, and the one that seems best to me right now is to get rid of FastDivInserter, which is already very simple, and make FastDivInsertionTask a one-shot class.

DivCacheTy Cache;
while (Next != nullptr) {
  Instruction *I = Next;
  Next = Next->getNextNode();

  if (Value *Replacement = FastDivInsertionTask(I, Cache, BypassWidths).getReplacement()) {
    I->RAUW(Replacement);
    MadeChange |= true;
  }
}

WDYT?

jlebar added inline comments.Feb 13 2017, 4:29 PM
lib/Transforms/Utils/BypassSlowDivision.cpp
80 ↗(On Diff #88213)

This is actually SlowInstr or InstrToReplace or SlowDivOrRem -- we should be careful here and elsewhere not to call things divs that may be rems.

92 ↗(On Diff #88213)

I would feel a lot more comfortable if our constructor initialized all of these variables. Otherwise it seems like we're asking for trouble.

98 ↗(On Diff #88213)

"Runtime" as in "runtime check" is usually one word.

n.bozhenov added inline comments.Feb 14 2017, 1:51 PM
lib/Transforms/Utils/BypassSlowDivision.cpp
346 ↗(On Diff #88213)

Hi Justin,

Thanks for reviewing the code so quickly. And I really like the code sample you suggested. If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it? If this is the case, then moving the code from FastDivInserter into FastDivInsertionTask constructor is indeed one of the possible solutions.

Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside. Something like this:

// In bypassSlowDivision:
FastDivInserter FDI;
while (Next != nullptr) {
  Instruction *I = Next;
  Next = Next->getNextNode();
  MadeChange |= FDI.tryReplaceSlowDiv(I);
}
// In FastDivInserter::tryReplaceSlowDiv:
FastDivInsertionTask Task(I, BypassWidths);
if (!Task.isSlowDivision())
  return false;

Value *R = Task.getReplacement(Cache);
if (!R)
  return false;

I->RAUW(R);
return true;

Currently I'm playing with these two approaches and will come with an updated patch by tomorrow. Please, comment if you're strongly in favour of one of these approaches.

jlebar added inline comments.Feb 14 2017, 1:56 PM
lib/Transforms/Utils/BypassSlowDivision.cpp
346 ↗(On Diff #88213)

If I understand correctly, your main concern is that in my code the Task object gets reused for different instructions during the pass, isn't it?

Yes.

I'm also a little concerned about the mutable state inside of the Task object used for passing between functions, but I wanted to get this part figured out first.

Another approach is to create a new Task object for each instruction inside an FastDivInserter object and never expose it outside

Right. If FastDivInserter has nontrivial complexity that it doesn't make sense to push into FastDivInsertionTask or the outer pass function body, this might make sense. When I looked at it, I didn't think it did.

n.bozhenov updated this revision to Diff 88552.Feb 15 2017, 9:08 AM
n.bozhenov added inline comments.Feb 15 2017, 9:12 AM
lib/Transforms/Utils/BypassSlowDivision.cpp
346 ↗(On Diff #88213)

I have updated the patch. I have got rid of FastDivInserter class. And the Task is not reused any more, but a new object is constructed for each instruction.

As for mutable fields used for passing data between functions, it's possible to get rid of them as well. We could define

struct IncomingDivRemPair {
  BasicBlock *BB; //< PHINode predecessor for the following values.
  Value *Quotient;
  Value *Remainder;
};

and return such a structure by value from createSlowBB and createFastBB methods. Later, we could pass a pair of such structures into createDivRemPhiNodes to make the latter both more flexible and easier to understand. The only reason I haven't done this yet is that I don't like very much returning structures by value. Do you think it is worth doing?

n.bozhenov marked 3 inline comments as done.Feb 15 2017, 9:20 AM
n.bozhenov added inline comments.
lib/Transforms/Utils/BypassSlowDivision.cpp
80 ↗(On Diff #88213)

During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder.

To make the naming somewhat less confusing I suggest using the full word (Division) to refer to both Div and Rem operations, and use the abbreviations (Div/Rem) to refer to specific operation kinds. According to that, SlowDiv should be renamed into SlowDivision, FastDivInsertionTask into FastDivisionInsertionTask, and isDivisionOp into isDivOp. What do you think?

jlebar added inline comments.Feb 15 2017, 10:18 AM
lib/Transforms/Utils/BypassSlowDivision.cpp
80 ↗(On Diff #88552)

Do we really need this field, since it's just SlowDiv->getOpcode()?

82 ↗(On Diff #88552)

Similarly, do we need this field, since it can be derived from SlowDiv?

You could have a member function for it if it's annoying to get N separate times. I think a member function would be better because it makes it explicit that this is SlowDiv->getType().

84 ↗(On Diff #88552)

Similarly for the MainBB field.

86 ↗(On Diff #88552)

Similarly for the Dividend and Divisor fields.

98 ↗(On Diff #88552)

FastBB and SlowBB, to match the functions right above?

118 ↗(On Diff #88552)

Can some of these be made private and/or deleted now?

126 ↗(On Diff #88552)

Consider using inline initializers for these, otherwise it's easy to add a member variable and forget to initialize it.

class Foo {
  void* ptr = nullptr;
};
128 ↗(On Diff #88552)

Can we do this in the initialization list?

146 ↗(On Diff #88552)

Perhaps

// Skip division on vector types; only optimize integer instructions.
auto* SlowType = dyn_cast<IntegerType>(I->getType());
if (!SlowType) return;
auto BI = BypassWidths.find(SlowType->getBitWidth());

(Make SlowType a local variable, use dyn_cast instead of checking twice, and use auto for the iterator type. Also get rid of the comma splice in the comment and add a period.)

148 ↗(On Diff #88552)

Nit, please add periods at the end of sentences.

166 ↗(On Diff #88552)

previously-computed

177 ↗(On Diff #88552)

auto

181 ↗(On Diff #88552)

It seems like much less of a layering violation to me if insertFastDiv (possibly renamed to createFastDivOrRem) would return Optional<std::pair<Value*, Value*>>. Then we would update the cache here.

185 ↗(On Diff #88552)

s/Cache/cache/

196 ↗(On Diff #88552)

slow div and rem operations

214 ↗(On Diff #88552)

fast div and rem operations

224 ↗(On Diff #88552)

While we're here, probably should add missing word: "because *this* optimization only handles positive numbers"

253 ↗(On Diff #88552)

"Creates a runtime check to test whether both the divisor and dividend fit into BypassType"?

255 ↗(On Diff #88552)

Nit, a linebreak without any intervening whitespace like we have here isn't a meaningful punctuation. Please flow as a single paragraph, or split into two paragraphs by inserting a blank line.

282 ↗(On Diff #88552)

This sentence runs on. We could remove "and the longer-slower div/rem instruction otherwise." Otherwise, can you split into two sentences?

295 ↗(On Diff #88552)

Maybe "Split the basic block before the div/rem."

297 ↗(On Diff #88552)

This line could use a comment, I think.

306 ↗(On Diff #88552)

Perhaps we should update this comment.

324 ↗(On Diff #88552)

I know I wrote it this way initially, but now that I see it...MadeChange = true? :)

80 ↗(On Diff #88213)

During this optimization we generally use 'slow division' to refer to both Div and Rem operations. I believe that strictly following your suggestion would be overkill. For example, the file itself and its main entry point are named BypassSlowDivision.cpp and llvm::bypassSlowDivision. I'm not sure if we should rename them into something like bypassSlowDivisionOrRemainder.

I agree that renaming the file and pass would be overkill. But I think we can still improve readability by being just a little less strict about the suggestion.

Even if we had a comment explaining that "division" == "div or mod" at the top of the file, that would be us inventing a new naming idiom, which is going to add to the cognitive cost to users of understanding our code. And realistically speaking, a large fraction of people aren't going to notice this comment at all, in which case they're just going to be lost.

In contrast, SlowDivOrRem is the same number of characters as SlowDivision, but is unambiguous. I'd also be OK with SlowInstr if you prefer that.

Again I don't think we need to be totally strict about the rule. I don't think that FastDivInsertionTask is ambiguous, for example, although perhaps a better name is FastDivRemInsertionTask would more closely match what it does (it always inserts both instrs). But the SlowDiv member here is, I think, ambiguous, because some of the members here *are* necessarily divs. Similarly, I think createDivRuntimeCheck would be less ambiguous as insertOperandRuntimeCheck or maybe something like insertFastSlowBranch, and insertFastDiv would be better named insertFastDivAndRem.

346 ↗(On Diff #88213)

Oh yes, I like this much better.

The only reason I haven't done this yet is that I don't like very much returning structures by value.

If it makes you feel any better, the calling convention returns large structs "by outparam". :)

n.bozhenov marked 24 inline comments as done.Feb 17 2017, 12:18 PM
n.bozhenov added inline comments.
lib/Transforms/Utils/BypassSlowDivision.cpp
84 ↗(On Diff #88552)

Things get more difficult with MainBB. After splitting the basic block it is not obvious any more what basic block SlowDivOrRem->getParent() will return. So, I believe it's better to keep MainBB member in the class.

181 ↗(On Diff #88552)

Right. I have moved all the caching logic into this routine. As a result, I could get rid of Cache field in the class. Now it's only passed here as a method argument.

306 ↗(On Diff #88552)

Not sure what's wrong with the comment.

jlebar edited edge metadata.Feb 18 2017, 11:44 AM

Sorry to keep going back and forth on this. I'm almost happy, but still have some meaty comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
94 ↗(On Diff #88939)

I think we could still be more explicit about which variables have to do with division specifically and which have to do with div/rem. Right now it's confusing that IsSlowDivision actually means "should we try to generate a fast div/rem for SlowDivOrRem?". Similarly it's confusing that isSignedDiv means "is SlowDivOrRem a signed div or rem?".

Maybe rename "IsSlowDivision" to "IsValid" or "IsValidTask"? And isSignedDiv to isSignedOp to match isDivisionOp?

106 ↗(On Diff #88939)

isSignedOp()?

174 ↗(On Diff #88939)

It's kind of confusing that DivRemResult is "nullable". While the result of insertFastDivAndRem is nullable, the value inside of the cache must not be null.

Seems like it would be clearer if we made DivRemResult not nullable and then returned an optional<DivRemResult> from insertFastDivAndRem.

181 ↗(On Diff #88939)

Perhaps just

CacheI = Cache.Insert({Key, DivRem}).first

? Not sure we need the comment if you write it like this, either.

188 ↗(On Diff #88939)

Nit, perhaps clearer as a ternary:

return isDivisionOp() ? Value.Quotient : Value.Remainder;
193 ↗(On Diff #88939)

"IncomingDivRemPair" is kind a confusing name, because the values are "outgoing" from one block and "incoming" to another.

It's also subtly different from DivRemResult.

It's going to be true that at least one of the Value*s inside our IncomingDivRemPair is an Instruction, right? If so, maybe we should

  • Rename DivRemResult to DivRemPair
  • Get rid of IncomingDivRemPair
  • Add a DivRemPair::basicBlock() method that gets the BB from its Values (assuming that at least one is an Instruction).
84 ↗(On Diff #88552)

sgtm. It's a lot more clear now.

306 ↗(On Diff #88552)

I could have sworn it used to say "identifies division instructions", but now it says "DIV/REM" instructions, which is all I wanted. We're good here.

n.bozhenov updated this revision to Diff 89065.Feb 19 2017, 7:56 AM
n.bozhenov marked an inline comment as done.
n.bozhenov marked 5 inline comments as done.Feb 19 2017, 8:03 AM
n.bozhenov added inline comments.
lib/Transforms/Utils/BypassSlowDivision.cpp
193 ↗(On Diff #88939)

I do agree that DivRemResult and IncomingDivRemPair have something in common and it's tempting to get rid of one of them. On the other hand, they serve different purposes and have no complex logic to share. As a result, I believe these two structures should be kept separate one from another. Not sure about if their names are perfect, though.

DivRemResult is a type that is used to cache results of bypassing. Usually, it is a pair of PHINodes. However, in D29897 there is a special case when it is a pair of ZExt instructions.

IncomingDivRemPair structures are used to create two PHINodes in createDivRemPhiNodes. Each of the IncomingDivRemPair arguments describes one of the incoming values for the PHINodes. Again, in D29897 there is a special case when Quotient = Zero and Remainder = Dividend. In this case it would be impossible to derive information about PHINode predecessor from DivRemPair. So, basic block information should be explicitely included somehow into createDivRemPhiNodes arguments.

Moreover, DivOpInfo and DivRemResult are auxilliary types for caching only. They contain only information significant for caching and it is extremely unlikely that any of these types will be extended in future. On the other hand, one may want to pass some additional information to createDivRemPhiNodes in future (e.g. some sort of metadata) and add additional fields to IncomingDivRemPair. It is not a problem only if DivRemResult and IncomingDivRemPair are two different types.

jlebar added inline comments.Feb 19 2017, 10:33 AM
lib/Transforms/Utils/BypassSlowDivision.cpp
45 ↗(On Diff #89065)

Now we can remove this constructor, right?

172 ↗(On Diff #89065)

It's not really a "div rem" -- it's the *result* of the div and rem, which is a quotient plus a remainder. Maybe just QuotRem or Result?

284 ↗(On Diff #89065)

return None here and elsewhere.

193 ↗(On Diff #88939)

Okay, how about we just name them based on what they contain, QuotRemPair and QuotRemWithBB?

n.bozhenov marked 4 inline comments as done.Feb 20 2017, 10:36 AM
n.bozhenov added inline comments.
lib/Transforms/Utils/BypassSlowDivision.cpp
45 ↗(On Diff #89065)

Yep.

172 ↗(On Diff #89065)

Makes sense.

284 ↗(On Diff #89065)

Right. Thanks.

193 ↗(On Diff #88939)

Definitely, such names have the advantage of being absolutely unambiguous.

jlebar accepted this revision.Feb 20 2017, 10:53 AM

lgtm with comments updated. \o/

lib/Transforms/Utils/BypassSlowDivision.cpp
40 ↗(On Diff #89133)

Please update comment. This is now just a pair, it's not the result of bypassing.

51 ↗(On Diff #89133)

Please update comment. This is now just two values plus a BB. If you want, you can say what the BB is supposed to represent. We should also say how this should be used, rather than describing how it is currently used in the algorithm. Perhaps:

A quotient and remainder, plus a BB from which they logically "originate". If you use Quotient or Remainder in a Phi node, you should use BB as its corresponding predecessor.

This revision is now accepted and ready to land.Feb 20 2017, 10:53 AM
n.bozhenov marked 4 inline comments as done.Feb 21 2017, 5:53 AM

lgtm with comments updated. \o/

Thank you very much for reviewing the patch and very helpful suggestions.

This revision was automatically updated to reflect the committed changes.