This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/Utils/
-
Transforms/
-
Utils/
-
BypassSlowDivision.cpp
-
test/Transforms/CodeGenPrepare/NVPTX/
-
Transforms/
-
CodeGenPrepare/
-
NVPTX/
-
bypass-slow-div-special-cases.ll

Differential D28200

[BypassSlowDivision] Do not bypass division of hash-like values
ClosedPublic

Authored by n.bozhenov on Dec 31 2016, 7:06 AM.

Download Raw Diff

Details

Reviewers

spatel
jlebar
craig.topper
bkramer

Commits

rGfca527af5c54: [BypassSlowDivision] Do not bypass division of hash-like values
rL299329: [BypassSlowDivision] Do not bypass division of hash-like values

Summary

Disable bypassing if one of the operands looks like a hash value. Slow
division often occurs in hashtable implementations and fast division is
never taken there because a hash value is extremely unlikely to have
enough upper bits set to zero.

A value is considered to be hash-like if it is produced by a

XOR operation
Multiplication by a constant wider than the shorter type
PHI instruction with at least one XOR or MUL+CONST operand.

Diff Detail

Repository: rL LLVM

Event Timeline

n.bozhenov updated this revision to Diff 82767.Dec 31 2016, 7:06 AM

n.bozhenov retitled this revision from to [BypassSlowDivision] Do not bypass division of hash-like values.

n.bozhenov updated this object.

n.bozhenov added reviewers: spatel, craig.topper, bkramer, jlebar.

n.bozhenov added subscribers: llvm-commits, zansari, DavidKreitzer and 2 others.

This patch is part of a larger patch set:

jlebar added inline comments.Dec 31 2016, 10:20 AM

lib/Transforms/Utils/BypassSlowDivision.cpp
78 ↗	(On Diff #82767)	s/the value/a value/
88 ↗	(On Diff #82767)	Suggest `s/quite//` and `s/generally//` -- we weaken it with "generally" then strengthen it with "quite", and then what do we mean exactly? :)
88 ↗	(On Diff #82767)	unlikely that such values will fit into the shorter type (If you want to use "for", you could use "uncommon for such values to fit into the shorter type", but I think that's less good here.)
105 ↗	(On Diff #82767)	s/At this point//
106 ↗	(On Diff #82767)	No comma needed after "So" here.
106 ↗	(On Diff #82767)	I don't actually understand what you mean by the first sentence. Can you try to rephrase wrt what the bitcast has to do with constant hoisting?
107 ↗	(On Diff #82767)	actually a constant
125 ↗	(On Diff #82767)	We don't need this case?
157 ↗	(On Diff #82767)	I'd just get rid of the last sentence entirely -- it's clear what's going on.
159 ↗	(On Diff #82767)	We only want to do this check for the dividend, right? I mean, I guess it doesn't really hurt to do it for the divisor too, but that isn't motivated by hashtables.
test/CodeGen/X86/bypass-slow-division-64.ll
138 ↗	(On Diff #82767)	Can we test with a constant that does fit into 32 bits?

Some minor drop-by comments inline

lib/Transforms/Utils/BypassSlowDivision.cpp
93 ↗	(On Diff #82767)	LLVM style is `Width`, `Depth`.
118 ↗	(On Diff #82767)	Why not: bool Found = llvm::any_of(P->incoming_values(), [&](Value *V) { return isHashLineValue(V, Width, Depth); }); if (Found) return true;

RKSimon added a subscriber: RKSimon.Dec 31 2016, 11:44 AM

n.bozhenov added a parent revision: D28199: [BypassSlowDivision] Use ValueTracking to simplify run-time checks.Jan 11 2017, 3:05 AM

The patch has been rebased onto D29897.

n.bozhenov marked 8 inline comments as done.Feb 23 2017, 9:01 AM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
118 ↗	(On Diff #82767)	That's cool! But do you think such code would be really easier to read? :)
125 ↗	(On Diff #82767)	My idea was to add an explicit do-nothing default branch because we are not going to cover all possible enumeration values. But it seems not working, because getOpcode() returns a plain unsigned int. I have deleted the default branch.
159 ↗	(On Diff #82767)	Basically, yes. This patch was motivated by hashtables and I expect this check to fire mostly for dividends. However, I tried to make the patch as generic as possible. And if we find a divisor that looks like a hash value, disabling bypassing for such a division operation will be a good idea.
test/CodeGen/X86/bypass-slow-division-64.ll
138 ↗	(On Diff #82767)	Isn't the next test what you're talking about?

n.bozhenov edited parent revisions, added: D29897: [BypassSlowDivision] Use ValueTracking to simplify run-time checks; removed: D28199: [BypassSlowDivision] Use ValueTracking to simplify run-time checks.Feb 23 2017, 9:03 AM

jlebar added inline comments.Feb 23 2017, 2:05 PM

lib/Transforms/Utils/BypassSlowDivision.cpp
228 ↗	(On Diff #89518)	Perhaps we can write this as return C && C->getValue()... And then below, we can write return any_of(P->incoming_values(), [&](Value V) { return isHashLikeValue(V, Width, Depth); }); We could even move the `cast` into the any_of call. return any_of(cast<PHINode>(I)->incoming_values(), [&](Value V) { return isHashLikeValue(V, Width, Depth); }); This is, I think, in the spirit of the LLVM style guide's section about "reducing nesting" and having early returns / breaks.
265 ↗	(On Diff #89518)	Maybe s/is supposed to detect/tries to detect/
118 ↗	(On Diff #82767)	I do, because it makes the high-level structure of the program explicit. That is, rather than saying "do a loop that does some stuff", we say "check if any elements have property X".
test/CodeGen/X86/bypass-slow-division-fnv.ll
3 ↗	(On Diff #89518)	Like in the earlier discussion, can we just test this in LLVM IR? Instead of testing loop unrolling, could we just run your testcase through opt -O3 -unroll-count=4 (maybe we only need unroll-count=2) and check that in? Then here we'd just need to run codegenprepare and check the output of that. We prefer for unit tests to test small units of code, instead of running the whole optimization pipeline. Whole-pipeline tests are better suited for the test-suite.
10 ↗	(On Diff #89518)	again s/is supposed to verify/verifies
12 ↗	(On Diff #89518)	Starting with "reorganized", I don't understand what this is saying, or how it meshes with the comment at the top of the file. Can we harmonize these comments and maybe just have one comment instead of two?

Moved most of the tests into CodeGenPrepare/NVPTX and changed RUN line in bypass-slow-division-fnv.ll.

Herald added a subscriber: jholewinski. · View Herald TranscriptMar 1 2017, 2:04 PM

n.bozhenov marked 3 inline comments as done.Mar 1 2017, 2:11 PM

n.bozhenov added inline comments.

test/CodeGen/X86/bypass-slow-division-fnv.ll
3 ↗	(On Diff #89518)	We prefer for unit tests to test small units of code, instead of running the whole optimization pipeline. Whole-pipeline tests are better suited for the test-suite. I believe that the main difference is that the test-suite is for execution tests and unit testing is for lit-tests, isn't it? Instead of testing loop unrolling, could we just run your testcase through opt -O3 -unroll-count=4 (maybe we only need unroll-count=2) and check that in? Then here we'd just need to run codegenprepare and check the output of that. That wouldn't make much sense. The recognized patterns are already checked with simpler tests I have put into bypass-slow-div-special-cases.ll. The purpose of this particular file is to make sure that the patterns we recognize are indeed the patterns produced by the middle-end at the maximal optimization level (I have removed the explicit unroll-count option). The problem this test tries to solve is that the PHI-heuristic in isHashLikeValue is quite fragile. For example, it could be broken if some loop optimization (unrolling/vectorization/whatever) would produce slightly different code with an additional instruction between XOR and REM. This test verifies that there are no such additional instructions and the code is correctly recognized as hash-like. And since the set of middle-end optimizations and their parameters are target-specific, I would like to have this test X86-specific. It is not an automatically generated test, it has only one CHECK statement, and therefore it is not a problem to have it as an llc test.

That wouldn't make much sense. The recognized patterns are already checked with simpler tests I have put into bypass-slow-div-special-cases.ll. The purpose of this particular file is to make sure that the patterns we recognize are indeed the patterns produced by the middle-end at the maximal optimization level (I have removed the explicit unroll-count option).

It sounds like there's an implicit "on this particular testcase" at the end of this sentence. "We are checking that we recognize the patterns produced by the middle-end at -O3 *on this particular testcase*."

The problem this test tries to solve is that the PHI-heuristic in isHashLikeValue is quite fragile. For example, it could be broken if some loop optimization (unrolling/vectorization/whatever) would produce slightly different code with an additional instruction between XOR and REM. This test verifies that there are no such additional instructions and the code is correctly recognized as hash-like.

It seems to me that if you think the code is fragile as-is, we should try to make it less fragile, so that a test targeting one specific testcase isn't necessary. For instance, we could increase the search depth some? This would obviate the need for an uber-specific, full-pipeline testcase. It would also keep us honest, in that we're not optimizing for one particular chunk of code.

And since the set of middle-end optimizations and their parameters are target-specific, I would like to have this test X86-specific. It is not an automatically generated test, it has only one CHECK statement, and therefore it is not a problem to have it as an llc test.

If we we really wanted to run the full pipeline, I don't see why we couldn't simply run opt -O3 | opt -codegenprepare. Can you think of a specific bug in this pass that would not be caught by this that would be caught by the lit test?

That wouldn't make much sense. The recognized patterns are already checked with simpler tests I have put into bypass-slow-div-special-cases.ll. The purpose of this particular file is to make sure that the patterns we recognize are indeed the patterns produced by the middle-end at the maximal optimization level (I have removed the explicit unroll-count option).

It sounds like there's an implicit "on this particular testcase" at the end of this sentence. "We are checking that we recognize the patterns produced by the middle-end at -O3 *on this particular testcase*."

Of course, on this particular testcase. But it is not some arbitrary testcase. While working on this patch I wanted to come up with an heuristic to recognize the most important hashing algorithms. And FNV algorithms are definitely very important algorithms. For example, they are used to calculate hashes for strings in both libstdc++ and libc++.

Unfortunately, the implemented heuristic is still incapable of disabling division bypassing in std::unordered_map<std::string, whatever>. The problem is that the calculated hash value is put to memory before division. So, while I'm happy with XOR and MUL heuristics, I'm not quite satisfied with the PHI heuristic.

The problem this test tries to solve is that the PHI-heuristic in isHashLikeValue is quite fragile. For example, it could be broken if some loop optimization (unrolling/vectorization/whatever) would produce slightly different code with an additional instruction between XOR and REM. This test verifies that there are no such additional instructions and the code is correctly recognized as hash-like.

It seems to me that if you think the code is fragile as-is, we should try to make it less fragile, so that a test targeting one specific testcase isn't necessary. For instance, we could increase the search depth some? This would obviate the need for an uber-specific, full-pipeline testcase. It would also keep us honest, in that we're not optimizing for one particular chunk of code.

Just increasing the search depth doesn't sound right to me because we would be at risk of having many false positive results.

After thinking it over once again, my idea now is to change the PHI heuristic. I believe a better approach would be to test all values instead of any_of and to use a check like

return all_of(P->incoming_values(), [&](Value *V) {
  return getValueRange(V) == VALRNG_LONG;
});

This way it would also be safe (no false positives) to increase the search depth and probably make it unlimited.

I will try to prepare a new patch and upload it by tomorrow.

Changed the PHI heuristic as planned.

And since the set of middle-end optimizations and their parameters are target-specific, I would like to have this test X86-specific. It is not an automatically generated test, it has only one CHECK statement, and therefore it is not a problem to have it as an llc test.

If we we really wanted to run the full pipeline, I don't see why we couldn't simply run opt -O3 | opt -codegenprepare. Can you think of a specific bug in this pass that would not be caught by this that would be caught by the lit test?

The reason is that opt -codegenprapare doesn't run BypassSlowDivision for X86. So, I have to run the whole X86 backend to check that the division is not bypassed.

I believe that the new version of the heuristic is much more robust than the previous one. However, a naive implementation didn't work initially. It turned out that hash values weren't recognized because of PHI nodes with undef incoming values generated by loop-unroll. I had to amend the patch to work around the problem.

Earlier I had problems with bitcast instructions. Only after running the whole pipeline test I found that sometimes long constants are hidden behind bitcast instructions.

Obviously, a manually created test would never contain such unexpected things.

Previously you suggested manually running the testcase through opt -O3 ... and checking that in. This wouldn't work either. Some future change in the middle-end will break the pattern recognition in codegenprepare but we never find it out because our codegen input stays the same.

So, I believe that it is worth having a full pipeline test to catch anything unexpected coming from the middle-end and breaking the pattern recognition.

The reason is that opt -codegenprapare doesn't run BypassSlowDivision for X86. So, I have to run the whole X86 backend to check that the division is not bypassed.

We can't make it an NVPTX test like we did the last time we had this problem?

I believe that the new version of the heuristic is much more robust than the previous one. However, a naive implementation didn't work initially. It turned out that hash values weren't recognized because of PHI nodes with undef incoming values generated by loop-unroll. I had to amend the patch to work around the problem.

It doesn't look like you added any "unit" tests to cover this changed behavior. Was that intentional?

Previously you suggested manually running the testcase through opt -O3 ... and checking that in. This wouldn't work either. Some future change in the middle-end will break the pattern recognition in codegenprepare but we never find it out because our codegen input stays the same.

Sure. It's also true that clang might change how it emits this code, or that you might have a front-end that isn't clang and also emits this code differently.

Ultimately I'm not comfortable LGTM'ing this patch so long as it contains this -O3 test. This is very different than how I understood we write tests. But that might just be my ignorance speaking; what you're trying to do may in fact be perfectly fine; I really don't know enough to say for sure. Let's get a second opinion? If @hfinkel or someone thinks this is the right approach, I'm happy.

lib/Transforms/Utils/BypassSlowDivision.cpp
85 ↗	(On Diff #90608)	I would prefer `VisitedSetTy` or something like that.
85 ↗	(On Diff #90608)	SmallSet<T*> is a typedef for SmallPtrSet. There are 3x as many mentions of SmallPtrSet as SmallSet in LLVM (that's counting all SmallSets, not just SmallSets of pointers), so I think we should call this SmallPtrSet (and change the #include).
229 ↗	(On Diff #90608)	Can we add a comment explaining why we return true here?
230 ↗	(On Diff #90608)	Sounds like we should call this VisitedPhis?
231 ↗	(On Diff #90608)	Why the change from any_of to all_of? You said earlier: This way it would also be safe (no false positives) to increase the search depth and probably make it unlimited. I am not sure what you mean by this. Are you making two claims about this change: a. It's "safe" in that it reduces our false positives, and b. It lets us safely increase the search depth to infinity? I don't see how the switch to any_of changes anything with respect to the cost of the recursion. I agree that pretty much any sane code is not going to have deep recursion here. But that's not what I'm worried about. I'm worried about some crazy code that causes us to have quadratic behavior. That's what the depth limit is there to protect against. With respect to (a) I see "safety" opposite of how you've described it. If we declare that something is hash-like, we won't optimize it, and not optimizing is the "safe" thing to do. Therefore the safe/conservative choice is to declare something as "maybe hash-like". Therefore any_of is safer / more conservative than all_of. I do have a major safety concern here, but it's about how isHashLikeValue is being used in getValueRange. Regardless of whether we use any_of or all_of here, it is not safe to return VALRNG_LONG if isHashLikeValue(V). If we return VALRNG_LONG, that is a promise that the value is long, and we take advantage of this explicit promise. isHashLikeValue is not a promise that the value is long; it's just a heuristic. So that whole thing needs to change. Please also add tests to catch this tricky bug. (Perhaps you were accidentally working around this bug in getValueRange by using all_of here and that's the source of this confusion.)

This revision now requires changes to proceed.Mar 5 2017, 11:01 AM

n.bozhenov updated this revision to Diff 93007.Mar 24 2017, 2:07 PM

n.bozhenov edited edge metadata.

In D28200#692591, @jlebar wrote:

I believe that the new version of the heuristic is much more robust than the previous one. However, a naive implementation didn't work initially. It turned out that hash values weren't recognized because of PHI nodes with undef incoming values generated by loop-unroll. I had to amend the patch to work around the problem.

It doesn't look like you added any "unit" tests to cover this changed behavior. Was that intentional?

I slightly updated a few tests for a newer heuristic, but generally the tests stay the same. It was intentional, yes.

Ultimately I'm not comfortable LGTM'ing this patch so long as it contains this -O3 test. This is very different than how I understood we write tests. But that might just be my ignorance speaking; what you're trying to do may in fact be perfectly fine; I really don't know enough to say for sure. Let's get a second opinion? If @hfinkel or someone thinks this is the right approach, I'm happy.

Ok. I have removed the test in question. Probably I kind of abused the lit testing infrastructure.

n.bozhenov marked 2 inline comments as done.Mar 24 2017, 2:23 PM

n.bozhenov added inline comments.

lib/Transforms/Utils/BypassSlowDivision.cpp
230 ↗	(On Diff #90608)	In the current implementation there are only PHINodes, but in the future it could be extended to cover SELECT instructions as well. Not sure if we should put Phi into the name.
231 ↗	(On Diff #90608)	Why the change from any_of to all_of? You said earlier: This way it would also be safe (no false positives) to increase the search depth and probably make it unlimited. I am not sure what you mean by this. Are you making two claims about this change: a. It's "safe" in that it reduces our false positives, and b. It lets us safely increase the search depth to infinity? Yes, I'm making both of these claims. Previously I considered a PHINode to be hash-like if any of its inputs were hash-like. If I made a search depth unlimited, a multiplication by a long constant would disable bypassing for all divisions reachable from it via a chain of PHINodes. Even if at runtime another input value is fed into those divisions. But in the current implementation I require all incoming values for a PHINode to be hash-like. It is a much stricter restriction. And because of that, it is safe (conservative) to increase search depth. One multiplication operation cannot poison some far division operation. I don't see how the switch to any_of changes anything with respect to the cost of the recursion. I agree that pretty much any sane code is not going to have deep recursion here. But that's not what I'm worried about. I'm worried about some crazy code that causes us to have quadratic behavior. That's what the depth limit is there to protect against. As we stop traversal of the IR after finding first value that is neither PHINode nor MUL/XOR, the number of traversed instructions is small for any non-malicious IR. And the recursion depth is limited by the length of the longest chain of consecutive PHINodes. With respect to (a) I see "safety" opposite of how you've described it. If we declare that something is hash-like, we won't optimize it, and not optimizing is the "safe" thing to do. Therefore the safe/conservative choice is to declare something as "maybe hash-like". Therefore any_of is safer / more conservative than all_of. My point is that without this patch all long divisions are bypassed. And generally it is a very useful optimization. But in some rare cases this optimization is useless (not harmful) and we would like to disable it. Such disabling is quite risky, because we may get significant performance degradation not bypassing a wrong division. So, here "safety" means "bypass when in doubt". I do have a major safety concern here, but it's about how isHashLikeValue is being used in getValueRange. Regardless of whether we use any_of or all_of here, it is not safe to return VALRNG_LONG if isHashLikeValue(V). If we return VALRNG_LONG, that is a promise that the value is long, and we take advantage of this explicit promise. isHashLikeValue is not a promise that the value is long; it's just a heuristic. So that whole thing needs to change. Please also add tests to catch this tricky bug. (Perhaps you were accidentally working around this bug in getValueRange by using all_of here and that's the source of this confusion.) Actually, the comment at VALRNG_LONG declaration says that it is not a promise. VALRNG_LONG means that a value is unlikely to fit into the shorter type. If we needed a promise, we would use ValueTracking only and would never introduce an additional analysis in isHashLikeValue. But for this particular optimization it doesn't matter if a value is guaranteed to be long or if it is just likely to be long.

Actually, the comment at VALRNG_LONG declaration says that it is not a promise.

Ah, you're right. I thought that the code that did "no division is needed at all: The quotient is 0 and the remainder is equal to Dividend." checked VALRNG_LONG, but it doesn't.

Regardless of what the comments say, I think this reveals that it is very confusing that we have VALRNG_SHORT, which is a promise, and VALRNG_LONG, which is not.

Maybe we should rename them to VALRNG_KNOWN_SHORT and VALRNG_LIKELY_LONG?

I slightly updated a few tests for a newer heuristic, but generally the tests stay the same. It was intentional, yes.

I can't approve a patch if its tests don't cover its behavior, sorry. If the behavior was worth changing between versions of the patch, it needs test coverage.

As we stop traversal of the IR after finding first value that is neither PHINode nor MUL/XOR, the number of traversed instructions is small for any non-malicious IR. And the recursion depth is limited by the length of the longest chain of consecutive PHINodes.

It's a design goal of LLVM to handle even pathological IR. I think this applies here. I cannot approve this patch if it blows up on IR with many consecutive phi nodes, sorry.

lib/Transforms/Utils/BypassSlowDivision.cpp
230 ↗	(On Diff #93007)	if (!Visited.insert(I).second)

This revision now requires changes to proceed.Mar 25 2017, 1:14 PM

n.bozhenov updated this revision to Diff 93086.Mar 26 2017, 3:31 PM

n.bozhenov edited edge metadata.

In D28200#710710, @jlebar wrote:

Actually, the comment at VALRNG_LONG declaration says that it is not a promise.

Ah, you're right. I thought that the code that did "no division is needed at all: The quotient is 0 and the remainder is equal to Dividend." checked VALRNG_LONG, but it doesn't.

Regardless of what the comments say, I think this reveals that it is very confusing that we have VALRNG_SHORT, which is a promise, and VALRNG_LONG, which is not.

Maybe we should rename them to VALRNG_KNOWN_SHORT and VALRNG_LIKELY_LONG?

Done.

I slightly updated a few tests for a newer heuristic, but generally the tests stay the same. It was intentional, yes.

I can't approve a patch if its tests don't cover its behavior, sorry. If the behavior was worth changing between versions of the patch, it needs test coverage.

I have added one test case which is handled differently by the new and old versions of the patch.

As we stop traversal of the IR after finding first value that is neither PHINode nor MUL/XOR, the number of traversed instructions is small for any non-malicious IR. And the recursion depth is limited by the length of the longest chain of consecutive PHINodes.

It's a design goal of LLVM to handle even pathological IR. I think this applies here. I cannot approve this patch if it blows up on IR with many consecutive phi nodes, sorry.

I have limited the number of visited PHINodes for a single division. Now no more than 16 PHINodes are traversed before giving up.

\o/

This revision is now accepted and ready to land.Mar 27 2017, 9:15 AM

Closed by commit rL299329: [BypassSlowDivision] Do not bypass division of hash-like values (authored by n.bozhenov). · Explain WhyApr 2 2017, 6:27 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

Utils/

BypassSlowDivision.cpp

93 lines

test/

Transforms/

CodeGenPrepare/

NVPTX/

bypass-slow-div-special-cases.ll

121 lines

Diff 93778

llvm/trunk/lib/Transforms/Utils/BypassSlowDivision.cpp

Show All 11 Lines
// For example, on Intel Atom 32-bit divides are slow enough that during		// For example, on Intel Atom 32-bit divides are slow enough that during
// runtime it is profitable to check the value of the operands, and if they are		// runtime it is profitable to check the value of the operands, and if they are
// positive and less than 256 use an unsigned 8-bit divide.		// positive and less than 256 use an unsigned 8-bit divide.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "llvm/Transforms/Utils/BypassSlowDivision.h"		#include "llvm/Transforms/Utils/BypassSlowDivision.h"
#include "llvm/ADT/DenseMap.h"		#include "llvm/ADT/DenseMap.h"
		#include "llvm/ADT/SmallPtrSet.h"
#include "llvm/Analysis/ValueTracking.h"		#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Function.h"		#include "llvm/IR/Function.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
#include "llvm/Transforms/Utils/Local.h"		#include "llvm/Transforms/Utils/Local.h"

using namespace llvm;		using namespace llvm;

▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines	static unsigned getHashValue(const DivOpInfo &Val) {
return (unsigned)(reinterpret_cast<uintptr_t>(Val.Dividend) ^		return (unsigned)(reinterpret_cast<uintptr_t>(Val.Dividend) ^
reinterpret_cast<uintptr_t>(Val.Divisor)) ^		reinterpret_cast<uintptr_t>(Val.Divisor)) ^
(unsigned)Val.SignedOp;		(unsigned)Val.SignedOp;
}		}
};		};

typedef DenseMap<DivOpInfo, QuotRemPair> DivCacheTy;		typedef DenseMap<DivOpInfo, QuotRemPair> DivCacheTy;
typedef DenseMap<unsigned, unsigned> BypassWidthsTy;		typedef DenseMap<unsigned, unsigned> BypassWidthsTy;
		typedef SmallPtrSet<Instruction *, 4> VisitedSetTy;
}		}

namespace {		namespace {
enum ValueRange {		enum ValueRange {
/// Operand definitely fits into BypassType. No runtime checks are needed.		/// Operand definitely fits into BypassType. No runtime checks are needed.
VALRNG_SHORT,		VALRNG_KNOWN_SHORT,
/// A runtime check is required, as value range is unknown.		/// A runtime check is required, as value range is unknown.
VALRNG_UNKNOWN,		VALRNG_UNKNOWN,
/// Operand is unlikely to fit into BypassType. The bypassing should be		/// Operand is unlikely to fit into BypassType. The bypassing should be
/// disabled.		/// disabled.
VALRNG_LONG		VALRNG_LIKELY_LONG
};		};

class FastDivInsertionTask {		class FastDivInsertionTask {
bool IsValidTask = false;		bool IsValidTask = false;
Instruction *SlowDivOrRem = nullptr;		Instruction *SlowDivOrRem = nullptr;
IntegerType *BypassType = nullptr;		IntegerType *BypassType = nullptr;
BasicBlock *MainBB = nullptr;		BasicBlock *MainBB = nullptr;

ValueRange getValueRange(Value *Op);		bool isHashLikeValue(Value *V, VisitedSetTy &Visited);
		ValueRange getValueRange(Value *Op, VisitedSetTy &Visited);
QuotRemWithBB createSlowBB(BasicBlock *Successor);		QuotRemWithBB createSlowBB(BasicBlock *Successor);
QuotRemWithBB createFastBB(BasicBlock *Successor);		QuotRemWithBB createFastBB(BasicBlock *Successor);
QuotRemPair createDivRemPhiNodes(QuotRemWithBB &LHS, QuotRemWithBB &RHS,		QuotRemPair createDivRemPhiNodes(QuotRemWithBB &LHS, QuotRemWithBB &RHS,
BasicBlock *PhiBB);		BasicBlock *PhiBB);
Value insertOperandRuntimeCheck(Value Op1, Value *Op2);		Value insertOperandRuntimeCheck(Value Op1, Value *Op2);
Optional<QuotRemPair> insertFastDivAndRem();		Optional<QuotRemPair> insertFastDivAndRem();

bool isSignedOp() {		bool isSignedOp() {
▲ Show 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	if (!OptResult)
return nullptr;		return nullptr;
CacheI = Cache.insert({Key, *OptResult}).first;		CacheI = Cache.insert({Key, *OptResult}).first;
}		}

QuotRemPair &Value = CacheI->second;		QuotRemPair &Value = CacheI->second;
return isDivisionOp() ? Value.Quotient : Value.Remainder;		return isDivisionOp() ? Value.Quotient : Value.Remainder;
}		}

		/// \brief Check if a value looks like a hash.
		///
		/// The routine is expected to detect values computed using the most common hash
		/// algorithms. Typically, hash computations end with one of the following
		/// instructions:
		///
		/// 1) MUL with a constant wider than BypassType
		/// 2) XOR instruction
		///
		/// And even if we are wrong and the value is not a hash, it is still quite
		/// unlikely that such values will fit into BypassType.
		///
		/// To detect string hash algorithms like FNV we have to look through PHI-nodes.
		/// It is implemented as a depth-first search for values that look neither long
		/// nor hash-like.
		bool FastDivInsertionTask::isHashLikeValue(Value *V, VisitedSetTy &Visited) {
		Instruction *I = dyn_cast<Instruction>(V);
		if (!I)
		return false;

		switch (I->getOpcode()) {
		case Instruction::Xor:
		return true;
		case Instruction::Mul: {
		// After Constant Hoisting pass, long constants may be represented as
		// bitcast instructions. As a result, some constants may look like an
		// instruction at first, and an additional check is necessary to find out if
		// an operand is actually a constant.
		Value *Op1 = I->getOperand(1);
		ConstantInt *C = dyn_cast<ConstantInt>(Op1);
		if (!C && isa<BitCastInst>(Op1))
		C = dyn_cast<ConstantInt>(cast<BitCastInst>(Op1)->getOperand(0));
		return C && C->getValue().getMinSignedBits() > BypassType->getBitWidth();
		}
		case Instruction::PHI: {
		// Stop IR traversal in case of a crazy input code. This limits recursion
		// depth.
		if (Visited.size() >= 16)
		return false;
		// Do not visit nodes that have been visited already. We return true because
		// it means that we couldn't find any value that doesn't look hash-like.
		if (Visited.find(I) != Visited.end())
		return true;
		Visited.insert(I);
		return llvm::all_of(cast<PHINode>(I)->incoming_values(), [&](Value *V) {
		// Ignore undef values as they probably don't affect the division
		// operands.
		return getValueRange(V, Visited) == VALRNG_LIKELY_LONG \|\|
		isa<UndefValue>(V);
		});
		}
		default:
		return false;
		}
		}

/// Check if an integer value fits into our bypass type.		/// Check if an integer value fits into our bypass type.
ValueRange FastDivInsertionTask::getValueRange(Value *V) {		ValueRange FastDivInsertionTask::getValueRange(Value *V,
		VisitedSetTy &Visited) {
unsigned ShortLen = BypassType->getBitWidth();		unsigned ShortLen = BypassType->getBitWidth();
unsigned LongLen = V->getType()->getIntegerBitWidth();		unsigned LongLen = V->getType()->getIntegerBitWidth();

assert(LongLen > ShortLen && "Value type must be wider than BypassType");		assert(LongLen > ShortLen && "Value type must be wider than BypassType");
unsigned HiBits = LongLen - ShortLen;		unsigned HiBits = LongLen - ShortLen;

const DataLayout &DL = SlowDivOrRem->getModule()->getDataLayout();		const DataLayout &DL = SlowDivOrRem->getModule()->getDataLayout();
APInt Zeros(LongLen, 0), Ones(LongLen, 0);		APInt Zeros(LongLen, 0), Ones(LongLen, 0);

computeKnownBits(V, Zeros, Ones, DL);		computeKnownBits(V, Zeros, Ones, DL);

if (Zeros.countLeadingOnes() >= HiBits)		if (Zeros.countLeadingOnes() >= HiBits)
return VALRNG_SHORT;		return VALRNG_KNOWN_SHORT;

if (Ones.countLeadingZeros() < HiBits)		if (Ones.countLeadingZeros() < HiBits)
return VALRNG_LONG;		return VALRNG_LIKELY_LONG;

		// Long integer divisions are often used in hashtable implementations. It's
		// not worth bypassing such divisions because hash values are extremely
		// unlikely to have enough leading zeros. The call below tries to detect
		// values that are unlikely to fit BypassType (including hashes).
		if (isHashLikeValue(V, Visited))
		return VALRNG_LIKELY_LONG;

return VALRNG_UNKNOWN;		return VALRNG_UNKNOWN;
}		}

/// Add new basic block for slow div and rem operations and put it before		/// Add new basic block for slow div and rem operations and put it before
/// SuccessorBB.		/// SuccessorBB.
QuotRemWithBB FastDivInsertionTask::createSlowBB(BasicBlock *SuccessorBB) {		QuotRemWithBB FastDivInsertionTask::createSlowBB(BasicBlock *SuccessorBB) {
QuotRemWithBB DivRemPair;		QuotRemWithBB DivRemPair;
▲ Show 20 Lines • Show All 87 Lines • ▼ Show 20 Lines	Optional<QuotRemPair> FastDivInsertionTask::insertFastDivAndRem() {
Value *Dividend = SlowDivOrRem->getOperand(0);		Value *Dividend = SlowDivOrRem->getOperand(0);
Value *Divisor = SlowDivOrRem->getOperand(1);		Value *Divisor = SlowDivOrRem->getOperand(1);

if (isa<ConstantInt>(Divisor)) {		if (isa<ConstantInt>(Divisor)) {
// Keep division by a constant for DAGCombiner.		// Keep division by a constant for DAGCombiner.
return None;		return None;
}		}

ValueRange DividendRange = getValueRange(Dividend);		VisitedSetTy SetL;
if (DividendRange == VALRNG_LONG)		ValueRange DividendRange = getValueRange(Dividend, SetL);
		if (DividendRange == VALRNG_LIKELY_LONG)
return None;		return None;

ValueRange DivisorRange = getValueRange(Divisor);		VisitedSetTy SetR;
if (DivisorRange == VALRNG_LONG)		ValueRange DivisorRange = getValueRange(Divisor, SetR);
		if (DivisorRange == VALRNG_LIKELY_LONG)
return None;		return None;

bool DividendShort = (DividendRange == VALRNG_SHORT);		bool DividendShort = (DividendRange == VALRNG_KNOWN_SHORT);
bool DivisorShort = (DivisorRange == VALRNG_SHORT);		bool DivisorShort = (DivisorRange == VALRNG_KNOWN_SHORT);

if (DividendShort && DivisorShort) {		if (DividendShort && DivisorShort) {
// If both operands are known to be short then just replace the long		// If both operands are known to be short then just replace the long
// division with a short one in-place.		// division with a short one in-place.

IRBuilder<> Builder(SlowDivOrRem);		IRBuilder<> Builder(SlowDivOrRem);
Value *TruncDividend = Builder.CreateTrunc(Dividend, BypassType);		Value *TruncDividend = Builder.CreateTrunc(Dividend, BypassType);
Value *TruncDivisor = Builder.CreateTrunc(Divisor, BypassType);		Value *TruncDivisor = Builder.CreateTrunc(Divisor, BypassType);
▲ Show 20 Lines • Show All 81 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/CodeGenPrepare/NVPTX/bypass-slow-div-special-cases.ll

	Show First 20 Lines • Show All 87 Lines • ▼ Show 20 Lines
	;			;
	%a.1 = zext i32 %a to i64			%a.1 = zext i32 %a to i64
	%div = udiv i64 %a.1, %b			%div = udiv i64 %a.1, %b
	%rem = urem i64 %a.1, %b			%rem = urem i64 %a.1, %b
	%res = add i64 %div, %rem			%res = add i64 %div, %rem
	store i64 %res, i64* %retptr			store i64 %res, i64* %retptr
	ret void			ret void
	}			}


				; Do not bypass a division if one of the operands looks like a hash value.
				define void @Test_dont_bypass_xor(i64 %a, i64 %b, i64 %l, i64* %retptr) {
				; CHECK-LABEL: @Test_dont_bypass_xor(
				; CHECK-NEXT: [[C:%.]] = xor i64 [[A:%.]], [[B:%.*]]
				; CHECK-NEXT: [[RES:%.]] = udiv i64 [[C]], [[L:%.]]
				; CHECK-NEXT: store i64 [[RES]], i64* [[RETPTR:%.*]]
				; CHECK-NEXT: ret void
				;
				%c = xor i64 %a, %b
				%res = udiv i64 %c, %l
				store i64 %res, i64* %retptr
				ret void
				}

				define void @Test_dont_bypass_phi_xor(i64 %a, i64 %b, i64 %l, i64* %retptr) {
				; CHECK-LABEL: @Test_dont_bypass_phi_xor(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[CMP:%.]] = icmp eq i64 [[B:%.]], 0
				; CHECK-NEXT: br i1 [[CMP]], label [[MERGE:%.]], label [[XORPATH:%.]]
				; CHECK: xorpath:
				; CHECK-NEXT: [[C:%.]] = xor i64 [[A:%.]], [[B]]
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[E:%.]] = phi i64 [ undef, [[ENTRY:%.]] ], [ [[C]], [[XORPATH]] ]
				; CHECK-NEXT: [[RES:%.]] = sdiv i64 [[E]], [[L:%.]]
				; CHECK-NEXT: store i64 [[RES]], i64* [[RETPTR:%.*]]
				; CHECK-NEXT: ret void
				;
				entry:
				%cmp = icmp eq i64 %b, 0
				br i1 %cmp, label %merge, label %xorpath

				xorpath:
				%c = xor i64 %a, %b
				br label %merge

				merge:
				%e = phi i64 [ undef, %entry ], [ %c, %xorpath ]
				%res = sdiv i64 %e, %l
				store i64 %res, i64* %retptr
				ret void
				}

				define void @Test_dont_bypass_mul_long_const(i64 %a, i64 %l, i64* %retptr) {
				; CHECK-LABEL: @Test_dont_bypass_mul_long_const(
				; CHECK-NEXT: [[C:%.]] = mul i64 [[A:%.]], 5229553307
				; CHECK-NEXT: [[RES:%.]] = urem i64 [[C]], [[L:%.]]
				; CHECK-NEXT: store i64 [[RES]], i64* [[RETPTR:%.*]]
				; CHECK-NEXT: ret void
				;
				%c = mul i64 %a, 5229553307 ; the constant doesn't fit 32 bits
				%res = urem i64 %c, %l
				store i64 %res, i64* %retptr
				ret void
				}

				define void @Test_bypass_phi_mul_const(i64 %a, i64 %b, i64* %retptr) {
				; CHECK-LABEL: @Test_bypass_phi_mul_const(
				; CHECK-NEXT: entry:
				; CHECK-NEXT: [[A_MUL:%.]] = mul nsw i64 [[A:%.]], 34806414968801
				; CHECK-NEXT: [[P:%.]] = icmp sgt i64 [[A]], [[B:%.]]
				; CHECK-NEXT: br i1 [[P]], label [[BRANCH:%.]], label [[MERGE:%.]]
				; CHECK: branch:
				; CHECK-NEXT: br label [[MERGE]]
				; CHECK: merge:
				; CHECK-NEXT: [[LHS:%.]] = phi i64 [ 42, [[BRANCH]] ], [ [[A_MUL]], [[ENTRY:%.]] ]
				; CHECK-NEXT: [[TMP0:%.*]] = or i64 [[LHS]], [[B]]
				; CHECK-NEXT: [[TMP1:%.*]] = and i64 [[TMP0]], -4294967296
				; CHECK-NEXT: [[TMP2:%.*]] = icmp eq i64 [[TMP1]], 0
				; CHECK-NEXT: br i1 [[TMP2]], label [[TMP3:%.]], label [[TMP8:%.]]
				; CHECK: [[TMP4:%.*]] = trunc i64 [[B]] to i32
				; CHECK-NEXT: [[TMP5:%.*]] = trunc i64 [[LHS]] to i32
				; CHECK-NEXT: [[TMP6:%.*]] = udiv i32 [[TMP5]], [[TMP4]]
				; CHECK-NEXT: [[TMP7:%.*]] = zext i32 [[TMP6]] to i64
				; CHECK-NEXT: br label [[TMP10:%.*]]
				; CHECK: [[TMP9:%.*]] = sdiv i64 [[LHS]], [[B]]
				; CHECK-NEXT: br label [[TMP10]]
				; CHECK: [[TMP11:%.*]] = phi i64 [ [[TMP7]], [[TMP3]] ], [ [[TMP9]], [[TMP8]] ]
				; CHECK-NEXT: store i64 [[TMP11]], i64* [[RETPTR:%.*]]
				; CHECK-NEXT: ret void
				;
				entry:
				%a.mul = mul nsw i64 %a, 34806414968801
				%p = icmp sgt i64 %a, %b
				br i1 %p, label %branch, label %merge

				branch:
				br label %merge

				merge:
				%lhs = phi i64 [ 42, %branch ], [ %a.mul, %entry ]
				%res = sdiv i64 %lhs, %b
				store i64 %res, i64* %retptr
				ret void
				}

				define void @Test_bypass_mul_short_const(i64 %a, i64 %l, i64* %retptr) {
				; CHECK-LABEL: @Test_bypass_mul_short_const(
				; CHECK-NEXT: [[C:%.]] = mul i64 [[A:%.]], -42
				; CHECK-NEXT: [[TMP1:%.]] = or i64 [[C]], [[L:%.]]
				; CHECK-NEXT: [[TMP2:%.*]] = and i64 [[TMP1]], -4294967296
				; CHECK-NEXT: [[TMP3:%.*]] = icmp eq i64 [[TMP2]], 0
				; CHECK-NEXT: br i1 [[TMP3]], label [[TMP4:%.]], label [[TMP9:%.]]
				; CHECK: [[TMP5:%.*]] = trunc i64 [[L]] to i32
				; CHECK-NEXT: [[TMP6:%.*]] = trunc i64 [[C]] to i32
				; CHECK-NEXT: [[TMP7:%.*]] = urem i32 [[TMP6]], [[TMP5]]
				; CHECK-NEXT: [[TMP8:%.*]] = zext i32 [[TMP7]] to i64
				; CHECK-NEXT: br label [[TMP11:%.*]]
				; CHECK: [[TMP10:%.*]] = urem i64 [[C]], [[L]]
				; CHECK-NEXT: br label [[TMP11]]
				; CHECK: [[TMP12:%.*]] = phi i64 [ [[TMP8]], [[TMP4]] ], [ [[TMP10]], [[TMP9]] ]
				; CHECK-NEXT: store i64 [[TMP12]], i64* [[RETPTR:%.*]]
				; CHECK-NEXT: ret void
				;
				%c = mul i64 %a, -42
				%res = urem i64 %c, %l
				store i64 %res, i64* %retptr
				ret void
				}

This is an archive of the discontinued LLVM Phabricator instance.

[BypassSlowDivision] Do not bypass division of hash-like valuesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 93778

llvm/trunk/lib/Transforms/Utils/BypassSlowDivision.cpp

llvm/trunk/test/Transforms/CodeGenPrepare/NVPTX/bypass-slow-div-special-cases.ll

[BypassSlowDivision] Do not bypass division of hash-like values
ClosedPublic