This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
include/llvm/
-
llvm/
-
Analysis/
-
TargetTransformInfo.h
-
TargetTransformInfoImpl.h
-
InitializePasses.h
-
Transforms/
-
Scalar.h
-
Scalar/
-
DivRemPairs.h
-
lib/
-
Analysis/
-
TargetTransformInfo.cpp
-
Passes/
-
PassBuilder.cpp
-
PassRegistry.def
-
Target/X86/
-
X86/
-
X86TargetTransformInfo.h
-
X86TargetTransformInfo.cpp
-
Transforms/
-
IPO/
-
PassManagerBuilder.cpp
-
Scalar/
-
CMakeLists.txt
-
DivRemPairs.cpp
-
Scalar.cpp
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-thinlto-defaults.ll
-
Transforms/DivRemPairs/
-
DivRemPairs/
-
div-rem-pairs.ll

Differential D37121

[DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)
ClosedPublic

Authored by spatel on Aug 24 2017, 2:54 PM.

Download Raw Diff

Details

Reviewers

• dberlin
efriedma
chandlerc
filcab
hfinkel

Commits

rG6fd4391ddd0c: [DivRempairs] add a pass to optimize div/rem pairs (PR31028)
rL312862: [DivRempairs] add a pass to optimize div/rem pairs (PR31028)

Summary

This is intended to be the same functionality as D31037 (EarlyCSE) but implemented as an independent pass, so there's no stretching of scope and feature creep for an existing pass. I also proposed a weaker version of this for SimplifyCFG in D30910. And I initially had almost this same functionality as another lump of CGP in the motivating example of PR31028 ( https://bugs.llvm.org/show_bug.cgi?id=31028 ). It's been a long road. :)

The advantage of positioning this ahead of SimplifyCFG / InstCombine in the pass pipeline is that we'll reduce the positive test cases to a single block:

%rem = urem i8 %a, %b
%div = udiv i8 %a, %b
%cmp = icmp eq i8 %div, 42
%sel = select i1 %cmp, i8 %rem, i8 3
ret i8 %sel

...and that can lead to better codegen even for targets that don't have a joint divrem instruction. D30910 has an AArch64 example of that.

Diff Detail

Repository: rL LLVM

Event Timeline

spatel created this revision.Aug 24 2017, 2:54 PM

Herald added subscribers: kristof.beyls, mgorny, mcrosier and 2 others. · View Herald TranscriptAug 24 2017, 2:54 PM

spatel mentioned this in D31037: [EarlyCSE] hoist div/rem when sibling op exists (PR31028).Aug 24 2017, 3:03 PM

efriedma added inline comments.Aug 24 2017, 4:06 PM

lib/Transforms/Scalar/DivRemHoist.cpp
86 ↗	(On Diff #112612)	Why is this assertion true? I don't see why two instructions with the same operands must have a dominance relation.

spatel added inline comments.Aug 24 2017, 4:16 PM

lib/Transforms/Scalar/DivRemHoist.cpp
86 ↗	(On Diff #112612)	Yes, you're right. The examples I was looking at weren't very diverse. Will fix.

davide added a subscriber: davide.Aug 24 2017, 4:45 PM

Patch updated:
Fixed to check that one op must dominate the other to allow hoisting and added test for that.

• dberlin added inline comments.Aug 25 2017, 8:54 AM

lib/Transforms/Scalar/DivRemHoist.cpp
78 ↗	(On Diff #112690)	You should check dominance of the parents, not the instructions. Though you currently avoid it, dt->dominates is a linear time test if you use it on instructions (and that's not obvious here) in the same block. The recommendation is to use orderedinstructions instead if you need to check same-block dominance, but you don't :)
83 ↗	(On Diff #112690)	It should go after, to retain the relative ordering.
89 ↗	(On Diff #112690)	Ditto, this should be after.

Patch updated:

Check dominance of the basic blocks rather than the instructions to avoid potential unexpected complexity.
Move the hoisted instruction after (rather than before) the other intruction using removeFromParent()+insertAfter().

lib/Transforms/Scalar/DivRemHoist.cpp
83 ↗	(On Diff #112690)	Agreed, that feels better. But now I remember why I picked "moveBefore"...there is no "moveAfter". Is this an oversight of the API or is there something about 'after' that requires different logic? Looks like other places do: I->removeFromParent(); I->insertAfter(OtherInst);

• dberlin added inline comments.Aug 25 2017, 5:11 PM

lib/Transforms/Scalar/DivRemHoist.cpp
83 ↗	(On Diff #112690)	This seems strange to me. The Eli tends to be much better than me at explaining the good reason something is the way it is that i've missed, so let's see what he says. MemorySSA also uses iplists and has moveAfter. All it requires there is the equivalent of: moveBefore(*MovePos->getParent(), ++MovePos->getIterator()); (since splice inserts before)

efriedma added inline comments.Aug 25 2017, 5:43 PM

lib/Transforms/Scalar/DivRemHoist.cpp
83 ↗	(On Diff #112690)	I can't think of any reason moveAfter doesn't exist. Looks like it's just an oversight.

Okay.
Then feel free to either add it, or just use the canonical idiom, and i'll add it and clean them all up.
Your call.

In D37121#853148, @dberlin wrote:

Okay.
Then feel free to either add it, or just use the canonical idiom, and i'll add it and clean them all up.
Your call.

If there are no other suggestions for this patch, I'd prefer to commit this as-is (with a FIXME comment) to get more testing underway and then come back for the clean-up.

I do have a few other questions (this is my first try at writing a new pass):

Are there guidelines/intuition about where to position this in the opt pipeline? It's currently near the end, but I don't have a good reason for that. I just think that it should be ahead of at least one SimplifyCFG to allow flattening.
Are there guidelines for when to enable this? It should be cheap in compile-time, so I put in -O1. But it's also rare, so that may be an argument for -O2 or even -O3?
Any thoughts about the TODO for PreservedAnalyses near the end of DivRemHoist.cpp?

spatel mentioned this in D37239: [Instruction] add moveAfter() convenience function; NFCI.Aug 28 2017, 5:15 PM

spatel mentioned this in rL312001: [Instruction] add moveAfter() convenience function; NFCI.Aug 29 2017, 7:12 AM

Patch updated:

Use moveAfter() to simplify the hoisting code (convenience function added with rL312001).
Move the pass later in the pipeline. I noticed that the transform wasn't holding in loops (example below) because loop transforms will sink the sibling op to its use. That defeats the point of this pass, so this time I've made the transform really late, but still before the final -simplifycfg.
Add the pass to the new pass manager pipeline in the equivalent position. I failed to add the pass at all in the last rev (!).
Adding to the new pipeline means there are actually tests to check that the pass is running, so updated those.
Added preservation of GlobalsAA for both pipelines. The lack of this was causing a test failure in test/Transforms/PhaseOrdering/globalaa-retained.ll with the previous placement of the pass. I'm not sure if that would still happen now, but it's safer to make that explicit?

Here's an example of a loop with div/rem where we do not want another pass to re-sink the rem (because codegen can't fix that). Should I add a PhaseOrdering test like this?

define void @rebase_mask(i32 %n, i32 %divisor, i32* %mask) {
entry:
  %cmp = icmp sgt i32 %n, 0
  br i1 %cmp, label %preheader, label %exit

preheader:
  br label %for.body

exit:
  ret void

for.body:
  %i = phi i32 [ %inc, %cleanup ], [ 0, %preheader ]
  %div = sdiv i32 %i, %divisor
  %idxprom = sext i32 %div to i64
  %arrayidx = getelementptr inbounds i32, i32* %mask, i64 %idxprom
  %ld = load i32, i32* %arrayidx, align 4
  %cmp1 = icmp slt i32 %ld, 0
  br i1 %cmp1, label %cleanup, label %if.end

if.end:
  %rem = srem i32 %i, %divisor
  store i32 %rem, i32* %arrayidx, align 4
  br label %cleanup

cleanup:
  %inc = add nuw nsw i32 %i, 1
  %exitcond = icmp eq i32 %inc, %n
  br i1 %exitcond, label %exit, label %for.body
}

Herald added a subscriber: eraman. · View Herald TranscriptAug 29 2017, 11:50 AM

On closer inspection, I misdiagnosed who was sinking the rem in my example, but I don't think it makes a difference to the patch. We don't need loops to show the problem...if simplifycfg can't flatten the code, then it's instcombine sinking instructions for some unstated reason:

define void @rebase_mask(i1 %cmp, i32 %num, i32 %divisor, i32* %p1, i32* %p2) {
entry:
  %div = sdiv i32 %num, %divisor
  store i32 %div, i32* %p1
  br i1 %cmp, label %exit, label %if.end

if.end:
  %rem = srem i32 %num, %divisor
  store i32 %rem, i32* %p2
  br label %exit

exit:
  ret void
}

./opt -div-rem-hoist -instcombine divrem.ll -S -debug
...
IC: Sink: %rem = srem i32 %num, %divisor

Instcombine sinks instructions with a single use in a successor block with no other predecessors.

spatel mentioned this in D37289: [X86] Speculatively load operands of select instruction.Aug 31 2017, 7:41 AM

Ping.

hfinkel added a subscriber: hfinkel.Sep 5 2017, 3:35 PM

hfinkel added inline comments.

lib/Transforms/Scalar/DivRemHoist.cpp
84 ↗	(On Diff #113128)	Please don't hard-code 1 here as the basis for comparison to the cost model. The normalization is target specific. Feel free to compare it a corresponding cost for Instruction::Add, for example. Also, to be clear, you're using the reciprocal-throughput model here. So if we can perform two adds every cycle, and that has a cost of 1, and we can do only one multiple per cycle, it would have a cost of 2. Is that too much? You might want to use the user-cost model (which is more like code size in some sense), but that's what SimplyCFG uses to do speculation. You could call TTTI->getOperationCost and compare it to TargetTransformInfo::TCC_Basic. We'll need to clean all of this up at some point, but I think that would be akin to what SimplifyCFG is doing for speculation costs.

spatel added inline comments.Sep 5 2017, 4:28 PM

lib/Transforms/Scalar/DivRemHoist.cpp
84 ↗	(On Diff #113128)	Ah, right. I should've remembered all that since I've complained about it before! Let me take another look at the current state of those cost models and do something less obnoxious than a magic number.

Patch updated:
As I was rummaging around in the cost models (realizing they're quite wrong for div/rem/mul in their own ways) and trying to bend them to express what I wanted...

I decided it would be better to just add a new TTI shim (hasDivRemOp()) to implement the behavior that I initially had when this was a CGP add-on. I realize we're at the edge of IR vs. backend transforms here, so let me know if this is wrong. Based on the existing hooks for things like masked ops, however, this doesn't look out of place to me.

I also figured it was best to just replace a non-hoistable rem instruction in-place and avoid getting into the inaccurate cost model mess to decide when hoisting would be the right thing to do. So now the i128 test for x86 shows that optimization instead of just bailing out. I think that's also the right thing to do for all targets that don't have HW divrem (everything in trunk besides x86?).

In D37121#862251, @spatel wrote:

Patch updated:
As I was rummaging around in the cost models (realizing they're quite wrong for div/rem/mul in their own ways) and trying to bend them to express what I wanted...

I decided it would be better to just add a new TTI shim (hasDivRemOp()) to implement the behavior that I initially had when this was a CGP add-on. I realize we're at the edge of IR vs. backend transforms here, so let me know if this is wrong. Based on the existing hooks for things like masked ops, however, this doesn't look out of place to me.

I also figured it was best to just replace a non-hoistable rem instruction in-place and avoid getting into the inaccurate cost model mess to decide when hoisting would be the right thing to do. So now the i128 test for x86 shows that optimization instead of just bailing out. I think that's also the right thing to do for all targets that don't have HW divrem (everything in trunk besides x86?).

Is this worthwhile only if the target has a combined dev/rem operation? Isn't the multiplication plus subtraction generally cheaper than an independent remainder operation in any case. Maybe, in the latter case, you simply want to do that operational replacement instead of hoisting?

include/llvm/Analysis/TargetTransformInfo.h
452 ↗	(On Diff #114020)	I think that you might as well add a Boolean parameter indicating whether or not this is for a signed or unsigned division (and then, in the X86 implementation, you can test the right ISD opcode for each).

In D37121#862515, @hfinkel wrote:

In D37121#862251, @spatel wrote:

Patch updated:
As I was rummaging around in the cost models (realizing they're quite wrong for div/rem/mul in their own ways) and trying to bend them to express what I wanted...

I decided it would be better to just add a new TTI shim (hasDivRemOp()) to implement the behavior that I initially had when this was a CGP add-on. I realize we're at the edge of IR vs. backend transforms here, so let me know if this is wrong. Based on the existing hooks for things like masked ops, however, this doesn't look out of place to me.

I also figured it was best to just replace a non-hoistable rem instruction in-place and avoid getting into the inaccurate cost model mess to decide when hoisting would be the right thing to do. So now the i128 test for x86 shows that optimization instead of just bailing out. I think that's also the right thing to do for all targets that don't have HW divrem (everything in trunk besides x86?).

Is this worthwhile only if the target has a combined dev/rem operation? Isn't the multiplication plus subtraction generally cheaper than an independent remainder operation in any case. Maybe, in the latter case, you simply want to do that operational replacement instead of hoisting?

No. Sorry if this wasn't clear - we're doing the replacement for all targets, not just x86. The hoisting of rem is the special case for x86.

include/llvm/Analysis/TargetTransformInfo.h
452 ↗	(On Diff #114020)	Yes - let me update that; that was just an oversight.

My earlier comments about this in Phab don't seem to have made it to the email list, so I'll send directly to the list if this doesn't either...

Patch updated:

Add bool to properly check availability of divrem with correct signedness.
Add PPC target to test file to show that the rem replacement part of the patch works for all targets.

hfinkel added inline comments.Sep 6 2017, 3:00 PM

lib/Transforms/Scalar/DivRemHoist.cpp
50 ↗	(On Diff #114073)	I'm missing something: where is DivRemMapKey defined?
85 ↗	(On Diff #114073)	I don't understand what's going on here? Can you please explain in the comment. Why is this way cheap but hoisting the rem near the div is not? I'm guessing this has something to do with the fact that DAGCombiner::useDivRem isn't called for divisions if TLI.isIntDivCheap returns true, but is always called on remainders? If that's it, should we mirror that logic more directly here (i.e. somehow directly incorporate the result of calling TLI.isIntDivCheap)?

spatel added inline comments.Sep 6 2017, 3:19 PM

lib/Transforms/Scalar/DivRemHoist.cpp
50 ↗	(On Diff #114073)	This is an existing struct used for a different div/rem optimization. It lives in: #include "llvm/Transforms/Utils/BypassSlowDivision.h"
85 ↗	(On Diff #114073)	This is independent of anything in the backend. That's why this transform is frustrating - it's half target-independent. :) Let's see if some more words or repeating the formula would make it better: "Hoist the div into the rem block. No cost calculation is needed because division is an implicit component of remainder: X % Y --> X - ((X / Y) * Y) If the target has a unified instruction for div/rem, then this will occur in a single instruction. If the target does not have a unified instruction for div/rem, then it must calculate remainder as (sub X, (mul (div X, Y), Y). Either way, hoisting division will be free."

hfinkel added inline comments.Sep 6 2017, 5:28 PM

lib/Transforms/Scalar/DivRemHoist.cpp
50 ↗	(On Diff #114073)	Ah, okay.
85 ↗	(On Diff #114073)	That is much better, thanks. However, are you making that substitution? It looks like you're just moving the division in this case (i.e. only doing `DivInst->moveAfter(RemInst);`)?

spatel added inline comments.Sep 6 2017, 5:43 PM

lib/Transforms/Scalar/DivRemHoist.cpp
85 ↗	(On Diff #114073)	We can just move the division in this pass because we can assume that the backend will be able to simplify the common division when the div and rem are in the same block. I suppose at this point we could decompose the dominating rem into div+mul+sub if we know the backend does not have a divrem. I don't think there would be any downside to that (other than we're repeating functionality that is known to exist in the backend)?

hfinkel added inline comments.Sep 6 2017, 5:52 PM

lib/Transforms/Scalar/DivRemHoist.cpp
85 ↗	(On Diff #114073)	We can just move the division in this pass because we can assume that the backend will be able to simplify the common division when the div and rem are in the same block. That's okay, but you should say that in the comment. However... I suppose at this point we could decompose the dominating rem into div+mul+sub if we know the backend does not have a divrem. I don't think there would be any downside to that (other than we're repeating functionality that is known to exist in the backend)? If this pass were running early in the pipeline, we'd definitely need to do this (because it would affect inlining/unrolling costs and the like). Because this runs relatively late, that's not a huge concern. However, I'd prefer that you do the decomposition here at the IR level anyway because a) you already have the code here to do it (so there's little added complexity) and b) in case there are any transformations later that are still using IR-level costs, and there very well might be in the backend (even target-specific ones), we might as well be kind to them and give them more-accurate costs.

spatel added inline comments.Sep 7 2017, 7:09 AM

lib/Transforms/Scalar/DivRemHoist.cpp
85 ↗	(On Diff #114073)	Agreed - now that we're doing replacement, we might as well make both cases do it. This does a raise a question: what should this pass be called now? div-rem-cse-or-hoist div-rem-opt div-rem-twins ...? Also, should I rename BypassSlowDivision.h to DivRem____ to make the home of DivRemMapKey more apparent?

Patch updated:

If the target doesn't have a divrem instruction, always decompose the rem.
Rename everything to 'DivRemPairs' to reflect the greater powers of the pass (open to other suggestions).
Improve code comments to better explain how this works.

hfinkel added inline comments.Sep 7 2017, 4:54 PM

lib/Transforms/Scalar/DivRemPairs.cpp
80 ↗	(On Diff #114195)	I'd like to remove this check. There are two reasons for this: Consistency (ending up with the expanded form if we started with the operations in different blocks, but not if they started in the same block, seems unfortunate). To allow us to get optimal results reliably in the backend in a straightforward way. For example, if a target does not have a legal rem instruction, we get the desired behavor because the rem will be legalized into the code with the div, and SDAG will reuse an existing identical div in the same block. If there is a legal rem instruction, however, this doesn't work automatically. In fact, the PowerPC backend has this comment: // PowerPC has no SREM/UREM instructions unless we are on P9 // On P9 we may use a hardware instruction to compute the remainder. // The instructions are not legalized directly because in the cases where the // result of both the remainder and the division is required it is more // efficient to compute the remainder from the result of the division rather // than use the remainder instruction. if (Subtarget.isISA3_0()) { setOperationAction(ISD::SREM, MVT::i32, Custom); setOperationAction(ISD::UREM, MVT::i32, Custom); setOperationAction(ISD::SREM, MVT::i64, Custom); setOperationAction(ISD::UREM, MVT::i64, Custom); } else { setOperationAction(ISD::SREM, MVT::i32, Expand); setOperationAction(ISD::UREM, MVT::i32, Expand); setOperationAction(ISD::SREM, MVT::i64, Expand); setOperationAction(ISD::UREM, MVT::i64, Expand); } and if we always expand here, I think we can clean this up. I think that you can do this just by writing: if (HasDivRemOp && RemBB == DivBB) continue; bool DivDominates = DT.dominates(DivInst, RemInst); what Danny said earlier about the instruction-level dominance checks being expensive within the same block is true, but there's no extra expense compared to the BB version if the instructions are different BBs. EarlyCSE should ensure that there are at most one div or rem with any particular set of arguments, so I don't think we need to do anything special to deal with algorithmic complexity issues (although if we're really concerned about that, we can use OrderedInstructions).
93 ↗	(On Diff #114195)	I wouldn't phrase this as misleading. Arguably, it's more misleading this way. The point is that, if you decompose here, some later pass might disturb the pattern such that it is not recognizable in the backend.

Patch updated:

For a target that doesn't have div+rem, handle the case where the div and rem are in the same block by decomposing the rem.
Add tests for those cases (first 2 tests in the test file).
Update comments to better explain the same block scenario and remove low quality comment.

LGTM

This revision is now accepted and ready to land.Sep 8 2017, 2:01 PM

Closed by commit rL312862: [DivRempairs] add a pass to optimize div/rem pairs (PR31028) (authored by spatel). · Explain WhySep 9 2017, 6:39 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D30910: [SimplifyCFG] allow speculation of div/rem when sibling op exists (PR31028).Sep 11 2017, 4:07 PM

After some sleuthing, I discovered that the addition of this new (morally good) pass introduced a moderate regression on NVPTX.

Previously, when given a div/rem pair operating over i64, we'd check if the operands fit in i32 in BypassSlowDivision (during codegenprepare). If so, we'd replace the div/rem pair with i32 div/rem ops. Then in the NVPTX selection dag, we'd compute the rem from this div using the same formulation as here. But since the div was i32, the rem computation would also happen in i32.

After this change, the rem is replaced with i64 arithmetic early on in the pipeline. BypassSlowDivision replaces the i64 div with an i32 div as before. But because this happens during codegenprepare, nobody ever changes the rem computation to happen in 32 bits.

I'm not sure what is the right fix for this. We could teach instcombine to strength-reduce i64 divides where the operands are known to fit into 32 bits into i32 divs? It already does this for zext'ed operands, but not in general. But I'm not sure what is the principled way to tell instcombine whether and how to do this strength reduction. (Like, maybe we want to convert 64-bit divs into 32-bit divs, but do we want to convert 64-bit divs into 8-bit divs, if it fits?)

In D37121#1024662, @jlebar wrote:

After some sleuthing, I discovered that the addition of this new (morally good) pass introduced a moderate regression on NVPTX.

Previously, when given a div/rem pair operating over i64, we'd check if the operands fit in i32 in BypassSlowDivision (during codegenprepare). If so, we'd replace the div/rem pair with i32 div/rem ops. Then in the NVPTX selection dag, we'd compute the rem from this div using the same formulation as here. But since the div was i32, the rem computation would also happen in i32.

After this change, the rem is replaced with i64 arithmetic early on in the pipeline. BypassSlowDivision replaces the i64 div with an i32 div as before. But because this happens during codegenprepare, nobody ever changes the rem computation to happen in 32 bits.

I'm not sure what is the right fix for this. We could teach instcombine to strength-reduce i64 divides where the operands are known to fit into 32 bits into i32 divs? It already does this for zext'ed operands, but not in general. But I'm not sure what is the principled way to tell instcombine whether and how to do this strength reduction. (Like, maybe we want to convert 64-bit divs into 32-bit divs, but do we want to convert 64-bit divs into 8-bit divs, if it fits?)

Can you post an IR example or file a bug that shows the failure? If BypassSlowDivision can get it, but InstCombine can not, then the difference comes down to using computeKnownBits?

Div/rem IR instructions are rare, and we can't do much analysis/combining based on the output, so using ValueTracking to narrow them as a canonicalization step is not an excessive cost IMO.
If that is opposed, we could do the narrowing in this pass as a perf optimization?

In D37121#1025333, @spatel wrote:

Can you post an IR example or file a bug that shows the failure? If BypassSlowDivision can get it, but InstCombine can not, then the difference comes down to using computeKnownBits?

Sure, something like:

target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
  %b = and i64 %a, 65535
  %div = udiv i64 %b, 42
  %rem = urem i64 %b, 42
  store i64 %div, i64* %ptr1
  store i64 %rem, i64* %ptr2
  ret void
}

$ opt -codegenprepare -S

define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
  %b = and i64 %a, 65535
  %1 = trunc i64 %b to i32
  %2 = udiv i32 %1, 42
  %3 = urem i32 %1, 42
  %4 = zext i32 %2 to i64
  %5 = zext i32 %3 to i64
  store i64 %4, i64* %ptr1
  store i64 %5, i64* %ptr2
  ret void
}

$ opt -codegenprepare -S | llc
[snip]
        shr.u32         %r2, %r1, 1;
        mul.wide.u32    %rd3, %r2, 818089009;
        shr.u64         %rd4, %rd3, 34;
        cvt.u32.u64     %r3, %rd4;
        mul.lo.s32      %r4, %r3, 42;
        sub.s32         %r5, %r1, %r4;

$ opt -div-rem-pairs -codegenprepare -S
define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
  %b = and i64 %a, 65535
  %1 = trunc i64 %b to i32
  %2 = udiv i32 %1, 42
  %3 = zext i32 %2 to i64
  %4 = mul i64 %3, 42
  %5 = sub i64 %b, %4
  store i64 %3, i64* %ptr1
  store i64 %5, i64* %ptr2
  ret void
}

$ opt -div-rem-pairs -codegenprepare -S | llc
[snip]
        shr.u32         %r2, %r1, 1;
        mul.wide.u32    %rd4, %r2, 818089009;
        shr.u64         %rd5, %rd4, 34;
        cvt.u32.u64     %r3, %rd5;
        mul.wide.u32    %rd6, %r3, 42;
        sub.s64         %rd7, %rd1, %rd6;

In D37121#1026812, @jlebar wrote:
In D37121#1025333, @spatel wrote:

Can you post an IR example or file a bug that shows the failure? If BypassSlowDivision can get it, but InstCombine can not, then the difference comes down to using computeKnownBits?

Sure, something like:
target datalayout = "e-i64:64-v16:16-v32:32-n16:32:64"
target triple = "nvptx64-nvidia-cuda"

define void @foo(i64 %a, i64* %ptr1, i64* %ptr2) {
  %b = and i64 %a, 65535
  %div = udiv i64 %b, 42
  %rem = urem i64 %b, 42
  store i64 %div, i64* %ptr1
  store i64 %rem, i64* %ptr2
  ret void
}

Are you confident that the problem is limited to cases with a constant div/rem operand and masking of the variable that could be replaced by a trunc? If so, then we could add a narrow pattern match fix without using computeKnownBits:

Name: udiv_shrink
%b = and i32 %a, 65535
%r = udiv i32 %b, 42

=>

%t = trunc i32 %a to i16
%u = udiv i16 %t, 42
%r = zext i16 %u to i32

https://rise4fun.com/Alive/EHK

I was worried that canEvaluateZExtd() would try to invert that transform, but either by oversight or intention, we don't widen udiv/urem there like most binops.

In D37121#1027143, @spatel wrote:

Are you confident that the problem is limited to cases with a constant div/rem operand and masking of the variable that could be replaced by a trunc?

Actually I'm pretty confident the problem is *not* limited to such cases -- sorry for making a misleading testcase. In the code I'm interested in, we know the bit-width of the dividend because of a branch on the dividend's value (x < some_constant) or because an llvm.assume tells us so.

In D37121#1027165, @jlebar wrote:

In D37121#1027143, @spatel wrote:

Are you confident that the problem is limited to cases with a constant div/rem operand and masking of the variable that could be replaced by a trunc?

Actually I'm pretty confident the problem is *not* limited to such cases -- sorry for making a misleading testcase. In the code I'm interested in, we know the bit-width of the dividend because of a branch on the dividend's value (x < some_constant) or because an llvm.assume tells us so.

Ah, then we'll need a stronger solution. If the shrinkability is based on propagation via cmp/br, then that's not something that instcombine would/should handle? Maybe we should add some shrinking logic to correlated-propagation?

In D37121#1027208, @spatel wrote:

Maybe we should add some shrinking logic to correlated-propagation?

That sounds like a plan to me. I put up a (perhaps weak) attempt at writing the patch in D44102.

spatel mentioned this in D44102: Teach CorrelatedValuePropagation to reduce the width of udiv/urem instructions..Mar 5 2018, 1:22 PM

Revision Contents

Path

Size

llvm/

trunk/

include/

llvm/

Analysis/

TargetTransformInfo.h

11 lines

TargetTransformInfoImpl.h

2 lines

InitializePasses.h

1 line

Transforms/

Scalar.h

6 lines

Scalar/

DivRemPairs.h

31 lines

lib/

Analysis/

TargetTransformInfo.cpp

4 lines

Passes/

PassBuilder.cpp

6 lines

PassRegistry.def

1 line

Target/

X86/

X86TargetTransformInfo.h

1 line

X86TargetTransformInfo.cpp

5 lines

Transforms/

IPO/

PassManagerBuilder.cpp

5 lines

Scalar/

CMakeLists.txt

1 line

DivRemPairs.cpp

206 lines

Scalar.cpp

1 line

test/

Other/

new-pm-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

Transforms/

DivRemPairs/

div-rem-pairs.ll

364 lines

Diff 114486

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

Show First 20 Lines • Show All 477 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedLoad(Type *DataType) const;		bool isLegalMaskedLoad(Type *DataType) const;

/// \brief Return true if the target supports masked gather/scatter		/// \brief Return true if the target supports masked gather/scatter
/// AVX-512 fully supports gather and scatter for vectors with 32 and 64		/// AVX-512 fully supports gather and scatter for vectors with 32 and 64
/// bits scalar type.		/// bits scalar type.
bool isLegalMaskedScatter(Type *DataType) const;		bool isLegalMaskedScatter(Type *DataType) const;
bool isLegalMaskedGather(Type *DataType) const;		bool isLegalMaskedGather(Type *DataType) const;

		/// Return true if the target has a unified operation to calculate division
		/// and remainder. If so, the additional implicit multiplication and
		/// subtraction required to calculate a remainder from division are free. This
		/// can enable more aggressive transformations for division and remainder than
		/// would typically be allowed using throughput or size cost models.
		bool hasDivRemOp(Type *DataType, bool IsSigned) const;

/// Return true if target doesn't mind addresses in vectors.		/// Return true if target doesn't mind addresses in vectors.
bool prefersVectorizedAddressing() const;		bool prefersVectorizedAddressing() const;

/// \brief Return the cost of the scaling factor used in the addressing		/// \brief Return the cost of the scaling factor used in the addressing
/// mode represented by AM for this target, for a load/store		/// mode represented by AM for this target, for a load/store
/// of the specified type.		/// of the specified type.
/// If the AM is supported, the return value must be >= 0.		/// If the AM is supported, the return value must be >= 0.
/// If the AM is not supported, it returns a negative value.		/// If the AM is not supported, it returns a negative value.
▲ Show 20 Lines • Show All 461 Lines • ▼ Show 20 Lines	virtual bool isLegalAddressingMode(Type Ty, GlobalValue BaseGV,
unsigned AddrSpace,		unsigned AddrSpace,
Instruction *I) = 0;		Instruction *I) = 0;
virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		virtual bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2) = 0;		TargetTransformInfo::LSRCost &C2) = 0;
virtual bool isLegalMaskedStore(Type *DataType) = 0;		virtual bool isLegalMaskedStore(Type *DataType) = 0;
virtual bool isLegalMaskedLoad(Type *DataType) = 0;		virtual bool isLegalMaskedLoad(Type *DataType) = 0;
virtual bool isLegalMaskedScatter(Type *DataType) = 0;		virtual bool isLegalMaskedScatter(Type *DataType) = 0;
virtual bool isLegalMaskedGather(Type *DataType) = 0;		virtual bool isLegalMaskedGather(Type *DataType) = 0;
		virtual bool hasDivRemOp(Type *DataType, bool IsSigned) = 0;
virtual bool prefersVectorizedAddressing() = 0;		virtual bool prefersVectorizedAddressing() = 0;
virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,		virtual int getScalingFactorCost(Type Ty, GlobalValue BaseGV,
int64_t BaseOffset, bool HasBaseReg,		int64_t BaseOffset, bool HasBaseReg,
int64_t Scale, unsigned AddrSpace) = 0;		int64_t Scale, unsigned AddrSpace) = 0;
virtual bool LSRWithInstrQueries() = 0;		virtual bool LSRWithInstrQueries() = 0;
virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;		virtual bool isTruncateFree(Type Ty1, Type Ty2) = 0;
virtual bool isProfitableToHoist(Instruction *I) = 0;		virtual bool isProfitableToHoist(Instruction *I) = 0;
virtual bool isTypeLegal(Type *Ty) = 0;		virtual bool isTypeLegal(Type *Ty) = 0;
▲ Show 20 Lines • Show All 206 Lines • ▼ Show 20 Lines	bool isLegalMaskedLoad(Type *DataType) override {
return Impl.isLegalMaskedLoad(DataType);		return Impl.isLegalMaskedLoad(DataType);
}		}
bool isLegalMaskedScatter(Type *DataType) override {		bool isLegalMaskedScatter(Type *DataType) override {
return Impl.isLegalMaskedScatter(DataType);		return Impl.isLegalMaskedScatter(DataType);
}		}
bool isLegalMaskedGather(Type *DataType) override {		bool isLegalMaskedGather(Type *DataType) override {
return Impl.isLegalMaskedGather(DataType);		return Impl.isLegalMaskedGather(DataType);
}		}
		bool hasDivRemOp(Type *DataType, bool IsSigned) override {
		return Impl.hasDivRemOp(DataType, IsSigned);
		}
bool prefersVectorizedAddressing() override {		bool prefersVectorizedAddressing() override {
return Impl.prefersVectorizedAddressing();		return Impl.prefersVectorizedAddressing();
}		}
int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale,		bool HasBaseReg, int64_t Scale,
unsigned AddrSpace) override {		unsigned AddrSpace) override {
return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,		return Impl.getScalingFactorCost(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace);		Scale, AddrSpace);
▲ Show 20 Lines • Show All 362 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

Show First 20 Lines • Show All 245 Lines • ▼ Show 20 Lines	public:
bool isLegalMaskedStore(Type *DataType) { return false; }		bool isLegalMaskedStore(Type *DataType) { return false; }

bool isLegalMaskedLoad(Type *DataType) { return false; }		bool isLegalMaskedLoad(Type *DataType) { return false; }

bool isLegalMaskedScatter(Type *DataType) { return false; }		bool isLegalMaskedScatter(Type *DataType) { return false; }

bool isLegalMaskedGather(Type *DataType) { return false; }		bool isLegalMaskedGather(Type *DataType) { return false; }

		bool hasDivRemOp(Type *DataType, bool IsSigned) { return false; }

bool prefersVectorizedAddressing() { return true; }		bool prefersVectorizedAddressing() { return true; }

int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,		int getScalingFactorCost(Type Ty, GlobalValue BaseGV, int64_t BaseOffset,
bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {		bool HasBaseReg, int64_t Scale, unsigned AddrSpace) {
// Guess that all legal addressing mode are free.		// Guess that all legal addressing mode are free.
if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,		if (isLegalAddressingMode(Ty, BaseGV, BaseOffset, HasBaseReg,
Scale, AddrSpace))		Scale, AddrSpace))
return 0;		return 0;
▲ Show 20 Lines • Show All 535 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/InitializePasses.h

	Show First 20 Lines • Show All 107 Lines • ▼ Show 20 Lines
	void initializeDeadInstEliminationPass(PassRegistry&);			void initializeDeadInstEliminationPass(PassRegistry&);
	void initializeDeadMachineInstructionElimPass(PassRegistry&);			void initializeDeadMachineInstructionElimPass(PassRegistry&);
	void initializeDelinearizationPass(PassRegistry&);			void initializeDelinearizationPass(PassRegistry&);
	void initializeDemandedBitsWrapperPassPass(PassRegistry&);			void initializeDemandedBitsWrapperPassPass(PassRegistry&);
	void initializeDependenceAnalysisPass(PassRegistry&);			void initializeDependenceAnalysisPass(PassRegistry&);
	void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);			void initializeDependenceAnalysisWrapperPassPass(PassRegistry&);
	void initializeDetectDeadLanesPass(PassRegistry&);			void initializeDetectDeadLanesPass(PassRegistry&);
	void initializeDivergenceAnalysisPass(PassRegistry&);			void initializeDivergenceAnalysisPass(PassRegistry&);
				void initializeDivRemPairsLegacyPassPass(PassRegistry&);
	void initializeDomOnlyPrinterPass(PassRegistry&);			void initializeDomOnlyPrinterPass(PassRegistry&);
	void initializeDomOnlyViewerPass(PassRegistry&);			void initializeDomOnlyViewerPass(PassRegistry&);
	void initializeDomPrinterPass(PassRegistry&);			void initializeDomPrinterPass(PassRegistry&);
	void initializeDomViewerPass(PassRegistry&);			void initializeDomViewerPass(PassRegistry&);
	void initializeDominanceFrontierWrapperPassPass(PassRegistry&);			void initializeDominanceFrontierWrapperPassPass(PassRegistry&);
	void initializeDominatorTreeWrapperPassPass(PassRegistry&);			void initializeDominatorTreeWrapperPassPass(PassRegistry&);
	void initializeDwarfEHPreparePass(PassRegistry&);			void initializeDwarfEHPreparePass(PassRegistry&);
	void initializeEarlyCSELegacyPassPass(PassRegistry&);			void initializeEarlyCSELegacyPassPass(PassRegistry&);
	▲ Show 20 Lines • Show All 260 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Transforms/Scalar.h

	Show First 20 Lines • Show All 371 Lines • ▼ Show 20 Lines
	//			//
	// GVN - This pass performs global value numbering and redundant load			// GVN - This pass performs global value numbering and redundant load
	// elimination cotemporaneously.			// elimination cotemporaneously.
	//			//
	FunctionPass *createNewGVNPass();			FunctionPass *createNewGVNPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// DivRemPairs - Hoist/decompose integer division and remainder instructions.
				//
				FunctionPass *createDivRemPairsPass();

				//===----------------------------------------------------------------------===//
				//
	// MemCpyOpt - This pass performs optimizations related to eliminating memcpy			// MemCpyOpt - This pass performs optimizations related to eliminating memcpy
	// calls and/or combining multiple stores into memset's.			// calls and/or combining multiple stores into memset's.
	//			//
	FunctionPass *createMemCpyOptPass();			FunctionPass *createMemCpyOptPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopDeletion - This pass performs DCE of non-infinite loops that it			// LoopDeletion - This pass performs DCE of non-infinite loops that it
	▲ Show 20 Lines • Show All 192 Lines • Show Last 20 Lines

llvm/trunk/include/llvm/Transforms/Scalar/DivRemPairs.h

				//===- DivRemPairs.h - Hoist/decompose integer division and remainder -----===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass hoists and/or decomposes integer division and remainder
				// instructions to enable CFG improvements and better codegen.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_TRANSFORMS_SCALAR_DIVREMPAIRS_H
				#define LLVM_TRANSFORMS_SCALAR_DIVREMPAIRS_H

				#include "llvm/IR/PassManager.h"

				namespace llvm {

				/// Hoist/decompose integer division and remainder instructions to enable CFG
				/// improvements and better codegen.
				struct DivRemPairsPass : public PassInfoMixin<DivRemPairsPass> {
				public:
				PreservedAnalyses run(Function &F, FunctionAnalysisManager &);
				};

				}
				#endif // LLVM_TRANSFORMS_SCALAR_DIVREMPAIRS_H

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

	Show First 20 Lines • Show All 170 Lines • ▼ Show 20 Lines
	bool TargetTransformInfo::isLegalMaskedGather(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedGather(Type *DataType) const {
	return TTIImpl->isLegalMaskedGather(DataType);			return TTIImpl->isLegalMaskedGather(DataType);
	}			}

	bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType) const {			bool TargetTransformInfo::isLegalMaskedScatter(Type *DataType) const {
	return TTIImpl->isLegalMaskedScatter(DataType);			return TTIImpl->isLegalMaskedScatter(DataType);
	}			}

				bool TargetTransformInfo::hasDivRemOp(Type *DataType, bool IsSigned) const {
				return TTIImpl->hasDivRemOp(DataType, IsSigned);
				}

	bool TargetTransformInfo::prefersVectorizedAddressing() const {			bool TargetTransformInfo::prefersVectorizedAddressing() const {
	return TTIImpl->prefersVectorizedAddressing();			return TTIImpl->prefersVectorizedAddressing();
	}			}

	int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,			int TargetTransformInfo::getScalingFactorCost(Type Ty, GlobalValue BaseGV,
	int64_t BaseOffset,			int64_t BaseOffset,
	bool HasBaseReg,			bool HasBaseReg,
	int64_t Scale,			int64_t Scale,
	▲ Show 20 Lines • Show All 1,005 Lines • Show Last 20 Lines

llvm/trunk/lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines
#include "llvm/Transforms/SampleProfile.h"		#include "llvm/Transforms/SampleProfile.h"
#include "llvm/Transforms/Scalar/ADCE.h"		#include "llvm/Transforms/Scalar/ADCE.h"
#include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"		#include "llvm/Transforms/Scalar/AlignmentFromAssumptions.h"
#include "llvm/Transforms/Scalar/BDCE.h"		#include "llvm/Transforms/Scalar/BDCE.h"
#include "llvm/Transforms/Scalar/ConstantHoisting.h"		#include "llvm/Transforms/Scalar/ConstantHoisting.h"
#include "llvm/Transforms/Scalar/CorrelatedValuePropagation.h"		#include "llvm/Transforms/Scalar/CorrelatedValuePropagation.h"
#include "llvm/Transforms/Scalar/DCE.h"		#include "llvm/Transforms/Scalar/DCE.h"
#include "llvm/Transforms/Scalar/DeadStoreElimination.h"		#include "llvm/Transforms/Scalar/DeadStoreElimination.h"
		#include "llvm/Transforms/Scalar/DivRemPairs.h"
#include "llvm/Transforms/Scalar/EarlyCSE.h"		#include "llvm/Transforms/Scalar/EarlyCSE.h"
#include "llvm/Transforms/Scalar/Float2Int.h"		#include "llvm/Transforms/Scalar/Float2Int.h"
#include "llvm/Transforms/Scalar/GVN.h"		#include "llvm/Transforms/Scalar/GVN.h"
#include "llvm/Transforms/Scalar/GuardWidening.h"		#include "llvm/Transforms/Scalar/GuardWidening.h"
#include "llvm/Transforms/Scalar/IVUsersPrinter.h"		#include "llvm/Transforms/Scalar/IVUsersPrinter.h"
#include "llvm/Transforms/Scalar/IndVarSimplify.h"		#include "llvm/Transforms/Scalar/IndVarSimplify.h"
#include "llvm/Transforms/Scalar/JumpThreading.h"		#include "llvm/Transforms/Scalar/JumpThreading.h"
#include "llvm/Transforms/Scalar/LICM.h"		#include "llvm/Transforms/Scalar/LICM.h"
▲ Show 20 Lines • Show All 657 Lines • ▼ Show 20 Lines	PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
// canonicalization pass that enables other optimizations. As a result,		// canonicalization pass that enables other optimizations. As a result,
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM		// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
// result too early.		// result too early.
OptimizePM.addPass(LoopSinkPass());		OptimizePM.addPass(LoopSinkPass());

// And finally clean up LCSSA form before generating code.		// And finally clean up LCSSA form before generating code.
OptimizePM.addPass(InstSimplifierPass());		OptimizePM.addPass(InstSimplifierPass());

		// This hoists/decomposes div/rem ops. It should run after other sink/hoist
		// passes to avoid re-sinking, but before SimplifyCFG because it can allow
		// flattening of blocks.
		OptimizePM.addPass(DivRemPairsPass());

// LoopSink (and other loop passes since the last simplifyCFG) might have		// LoopSink (and other loop passes since the last simplifyCFG) might have
// resulted in single-entry-single-exit or empty blocks. Clean up the CFG.		// resulted in single-entry-single-exit or empty blocks. Clean up the CFG.
OptimizePM.addPass(SimplifyCFGPass());		OptimizePM.addPass(SimplifyCFGPass());

// Add the core optimizing pipeline.		// Add the core optimizing pipeline.
MPM.addPass(createModuleToFunctionPassAdaptor(std::move(OptimizePM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(OptimizePM)));

// Now we need to do some global optimization transforms.		// Now we need to do some global optimization transforms.
▲ Show 20 Lines • Show All 1,001 Lines • Show Last 20 Lines

llvm/trunk/lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 136 Lines • ▼ Show 20 Lines
	FUNCTION_PASS("adce", ADCEPass())			FUNCTION_PASS("adce", ADCEPass())
	FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())			FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())
	FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())			FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())
	FUNCTION_PASS("bdce", BDCEPass())			FUNCTION_PASS("bdce", BDCEPass())
	FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())			FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())
	FUNCTION_PASS("consthoist", ConstantHoistingPass())			FUNCTION_PASS("consthoist", ConstantHoistingPass())
	FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())			FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
	FUNCTION_PASS("dce", DCEPass())			FUNCTION_PASS("dce", DCEPass())
				FUNCTION_PASS("div-rem-pairs", DivRemPairsPass())
	FUNCTION_PASS("dse", DSEPass())			FUNCTION_PASS("dse", DSEPass())
	FUNCTION_PASS("dot-cfg", CFGPrinterPass())			FUNCTION_PASS("dot-cfg", CFGPrinterPass())
	FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())			FUNCTION_PASS("dot-cfg-only", CFGOnlyPrinterPass())
	FUNCTION_PASS("early-cse", EarlyCSEPass(/UseMemorySSA=/false))			FUNCTION_PASS("early-cse", EarlyCSEPass(/UseMemorySSA=/false))
	FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/UseMemorySSA=/true))			FUNCTION_PASS("early-cse-memssa", EarlyCSEPass(/UseMemorySSA=/true))
	FUNCTION_PASS("gvn-hoist", GVNHoistPass())			FUNCTION_PASS("gvn-hoist", GVNHoistPass())
	FUNCTION_PASS("instcombine", InstCombinePass())			FUNCTION_PASS("instcombine", InstCombinePass())
	FUNCTION_PASS("instsimplify", InstSimplifierPass())			FUNCTION_PASS("instsimplify", InstSimplifierPass())
	▲ Show 20 Lines • Show All 84 Lines • Show Last 20 Lines

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h

Show First 20 Lines • Show All 118 Lines • ▼ Show 20 Lines	public:
int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,		int getIntImmCost(Intrinsic::ID IID, unsigned Idx, const APInt &Imm,
Type *Ty);		Type *Ty);
bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,		bool isLSRCostLess(TargetTransformInfo::LSRCost &C1,
TargetTransformInfo::LSRCost &C2);		TargetTransformInfo::LSRCost &C2);
bool isLegalMaskedLoad(Type *DataType);		bool isLegalMaskedLoad(Type *DataType);
bool isLegalMaskedStore(Type *DataType);		bool isLegalMaskedStore(Type *DataType);
bool isLegalMaskedGather(Type *DataType);		bool isLegalMaskedGather(Type *DataType);
bool isLegalMaskedScatter(Type *DataType);		bool isLegalMaskedScatter(Type *DataType);
		bool hasDivRemOp(Type *DataType, bool IsSigned);
bool areInlineCompatible(const Function *Caller,		bool areInlineCompatible(const Function *Caller,
const Function *Callee) const;		const Function *Callee) const;
bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);		bool expandMemCmp(Instruction *I, unsigned &MaxLoadSize);
bool enableInterleavedAccessVectorization();		bool enableInterleavedAccessVectorization();
private:		private:
int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,		int getGSScalarCost(unsigned Opcode, Type *DataTy, bool VariableMask,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);
int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,		int getGSVectorCost(unsigned Opcode, Type DataTy, Value Ptr,
unsigned Alignment, unsigned AddressSpace);		unsigned Alignment, unsigned AddressSpace);

/// @}		/// @}
};		};

} // end namespace llvm		} // end namespace llvm

#endif		#endif

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

Show First 20 Lines • Show All 2,509 Lines • ▼ Show 20 Lines	bool X86TTIImpl::isLegalMaskedGather(Type *DataTy) {
// AVX-512 allows gather and scatter		// AVX-512 allows gather and scatter
return (DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX512();		return (DataWidth == 32 \|\| DataWidth == 64) && ST->hasAVX512();
}		}

bool X86TTIImpl::isLegalMaskedScatter(Type *DataType) {		bool X86TTIImpl::isLegalMaskedScatter(Type *DataType) {
return isLegalMaskedGather(DataType);		return isLegalMaskedGather(DataType);
}		}

		bool X86TTIImpl::hasDivRemOp(Type *DataType, bool IsSigned) {
		EVT VT = TLI->getValueType(DL, DataType);
		return TLI->isOperationLegal(IsSigned ? ISD::SDIVREM : ISD::UDIVREM, VT);
		}

bool X86TTIImpl::areInlineCompatible(const Function *Caller,		bool X86TTIImpl::areInlineCompatible(const Function *Caller,
const Function *Callee) const {		const Function *Callee) const {
const TargetMachine &TM = getTLI()->getTargetMachine();		const TargetMachine &TM = getTLI()->getTargetMachine();

// Work this as a subsetting of subtarget features.		// Work this as a subsetting of subtarget features.
const FeatureBitset &CallerBits =		const FeatureBitset &CallerBits =
TM.getSubtargetImpl(*Caller)->getFeatureBits();		TM.getSubtargetImpl(*Caller)->getFeatureBits();
const FeatureBitset &CalleeBits =		const FeatureBitset &CalleeBits =
▲ Show 20 Lines • Show All 244 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

Show First 20 Lines • Show All 667 Lines • ▼ Show 20 Lines	void PassManagerBuilder::populateModulePassManager(
// LoopSink pass sinks instructions hoisted by LICM, which serves as a		// LoopSink pass sinks instructions hoisted by LICM, which serves as a
// canonicalization pass that enables other optimizations. As a result,		// canonicalization pass that enables other optimizations. As a result,
// LoopSink pass needs to be a very late IR pass to avoid undoing LICM		// LoopSink pass needs to be a very late IR pass to avoid undoing LICM
// result too early.		// result too early.
MPM.add(createLoopSinkPass());		MPM.add(createLoopSinkPass());
// Get rid of LCSSA nodes.		// Get rid of LCSSA nodes.
MPM.add(createInstructionSimplifierPass());		MPM.add(createInstructionSimplifierPass());

		// This hoists/decomposes div/rem ops. It should run after other sink/hoist
		// passes to avoid re-sinking, but before SimplifyCFG because it can allow
		// flattening of blocks.
		MPM.add(createDivRemPairsPass());

// LoopSink (and other loop passes since the last simplifyCFG) might have		// LoopSink (and other loop passes since the last simplifyCFG) might have
// resulted in single-entry-single-exit or empty blocks. Clean up the CFG.		// resulted in single-entry-single-exit or empty blocks. Clean up the CFG.
MPM.add(createCFGSimplificationPass());		MPM.add(createCFGSimplificationPass());

addExtensionsToPM(EP_OptimizerLast, MPM);		addExtensionsToPM(EP_OptimizerLast, MPM);
}		}

void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {		void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {
▲ Show 20 Lines • Show All 309 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt

	add_llvm_library(LLVMScalarOpts			add_llvm_library(LLVMScalarOpts
	ADCE.cpp			ADCE.cpp
	AlignmentFromAssumptions.cpp			AlignmentFromAssumptions.cpp
	BDCE.cpp			BDCE.cpp
	ConstantHoisting.cpp			ConstantHoisting.cpp
	ConstantProp.cpp			ConstantProp.cpp
	CorrelatedValuePropagation.cpp			CorrelatedValuePropagation.cpp
	DCE.cpp			DCE.cpp
	DeadStoreElimination.cpp			DeadStoreElimination.cpp
				DivRemPairs.cpp
	EarlyCSE.cpp			EarlyCSE.cpp
	FlattenCFGPass.cpp			FlattenCFGPass.cpp
	Float2Int.cpp			Float2Int.cpp
	GuardWidening.cpp			GuardWidening.cpp
	GVN.cpp			GVN.cpp
	GVNHoist.cpp			GVNHoist.cpp
	GVNSink.cpp			GVNSink.cpp
	IVUsersPrinter.cpp			IVUsersPrinter.cpp
	▲ Show 20 Lines • Show All 56 Lines • Show Last 20 Lines

llvm/trunk/lib/Transforms/Scalar/DivRemPairs.cpp

				//===- DivRemPairs.cpp - Hoist/decompose division and remainder -- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This pass hoists and/or decomposes integer division and remainder
				// instructions to enable CFG improvements and better codegen.
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/Scalar/DivRemPairs.h"
				#include "llvm/ADT/Statistic.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/TargetTransformInfo.h"
				#include "llvm/IR/Dominators.h"
				#include "llvm/IR/Function.h"
				#include "llvm/Pass.h"
				#include "llvm/Transforms/Scalar.h"
				#include "llvm/Transforms/Utils/BypassSlowDivision.h"
				using namespace llvm;

				#define DEBUG_TYPE "div-rem-pairs"
				STATISTIC(NumPairs, "Number of div/rem pairs");
				STATISTIC(NumHoisted, "Number of instructions hoisted");
				STATISTIC(NumDecomposed, "Number of instructions decomposed");

				/// Find matching pairs of integer div/rem ops (they have the same numerator,
				/// denominator, and signedness). If they exist in different basic blocks, bring
				/// them together by hoisting or replace the common division operation that is
				/// implicit in the remainder:
				/// X % Y <--> X - ((X / Y) * Y).
				///
				/// We can largely ignore the normal safety and cost constraints on speculation
				/// of these ops when we find a matching pair. This is because we are already
				/// guaranteed that any exceptions and most cost are already incurred by the
				/// first member of the pair.
				///
				/// Note: This transform could be an oddball enhancement to EarlyCSE, GVN, or
				/// SimplifyCFG, but it's split off on its own because it's different enough
				/// that it doesn't quite match the stated objectives of those passes.
				static bool optimizeDivRem(Function &F, const TargetTransformInfo &TTI,
				const DominatorTree &DT) {
				bool Changed = false;

				// Insert all divide and remainder instructions into maps keyed by their
				// operands and opcode (signed or unsigned).
				DenseMap<DivRemMapKey, Instruction *> DivMap, RemMap;
				for (auto &BB : F) {
				for (auto &I : BB) {
				if (I.getOpcode() == Instruction::SDiv)
				DivMap[DivRemMapKey(true, I.getOperand(0), I.getOperand(1))] = &I;
				else if (I.getOpcode() == Instruction::UDiv)
				DivMap[DivRemMapKey(false, I.getOperand(0), I.getOperand(1))] = &I;
				else if (I.getOpcode() == Instruction::SRem)
				RemMap[DivRemMapKey(true, I.getOperand(0), I.getOperand(1))] = &I;
				else if (I.getOpcode() == Instruction::URem)
				RemMap[DivRemMapKey(false, I.getOperand(0), I.getOperand(1))] = &I;
				}
				}

				// We can iterate over either map because we are only looking for matched
				// pairs. Choose remainders for efficiency because they are usually even more
				// rare than division.
				for (auto &RemPair : RemMap) {
				// Find the matching division instruction from the division map.
				Instruction *DivInst = DivMap[RemPair.getFirst()];
				if (!DivInst)
				continue;

				// We have a matching pair of div/rem instructions. If one dominates the
				// other, hoist and/or replace one.
				NumPairs++;
				Instruction *RemInst = RemPair.getSecond();
				bool IsSigned = DivInst->getOpcode() == Instruction::SDiv;
				bool HasDivRemOp = TTI.hasDivRemOp(DivInst->getType(), IsSigned);

				// If the target supports div+rem and the instructions are in the same block
				// already, there's nothing to do. The backend should handle this. If the
				// target does not support div+rem, then we will decompose the rem.
				if (HasDivRemOp && RemInst->getParent() == DivInst->getParent())
				continue;

				bool DivDominates = DT.dominates(DivInst, RemInst);
				if (!DivDominates && !DT.dominates(RemInst, DivInst))
				continue;

				if (HasDivRemOp) {
				// The target has a single div/rem operation. Hoist the lower instruction
				// to make the matched pair visible to the backend.
				if (DivDominates)
				RemInst->moveAfter(DivInst);
				else
				DivInst->moveAfter(RemInst);
				NumHoisted++;
				} else {
				// The target does not have a single div/rem operation. Decompose the
				// remainder calculation as:
				// X % Y --> X - ((X / Y) * Y).
				Value *X = RemInst->getOperand(0);
				Value *Y = RemInst->getOperand(1);
				Instruction *Mul = BinaryOperator::CreateMul(DivInst, Y);
				Instruction *Sub = BinaryOperator::CreateSub(X, Mul);

				// If the remainder dominates, then hoist the division up to that block:
				//
				// bb1:
				// %rem = srem %x, %y
				// bb2:
				// %div = sdiv %x, %y
				// -->
				// bb1:
				// %div = sdiv %x, %y
				// %mul = mul %div, %y
				// %rem = sub %x, %mul
				//
				// If the division dominates, it's already in the right place. The mul+sub
				// will be in a different block because we don't assume that they are
				// cheap to speculatively execute:
				//
				// bb1:
				// %div = sdiv %x, %y
				// bb2:
				// %rem = srem %x, %y
				// -->
				// bb1:
				// %div = sdiv %x, %y
				// bb2:
				// %mul = mul %div, %y
				// %rem = sub %x, %mul
				//
				// If the div and rem are in the same block, we do the same transform,
				// but any code movement would be within the same block.

				if (!DivDominates)
				DivInst->moveBefore(RemInst);
				Mul->insertAfter(RemInst);
				Sub->insertAfter(Mul);

				// Now kill the explicit remainder. We have replaced it with:
				// (sub X, (mul (div X, Y), Y)
				RemInst->replaceAllUsesWith(Sub);
				RemInst->eraseFromParent();
				NumDecomposed++;
				}
				Changed = true;
				}

				return Changed;
				}

				// Pass manager boilerplate below here.

				namespace {
				struct DivRemPairsLegacyPass : public FunctionPass {
				static char ID;
				DivRemPairsLegacyPass() : FunctionPass(ID) {
				initializeDivRemPairsLegacyPassPass(*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override {
				AU.addRequired<DominatorTreeWrapperPass>();
				AU.addRequired<TargetTransformInfoWrapperPass>();
				AU.setPreservesCFG();
				AU.addPreserved<DominatorTreeWrapperPass>();
				AU.addPreserved<GlobalsAAWrapperPass>();
				FunctionPass::getAnalysisUsage(AU);
				}

				bool runOnFunction(Function &F) override {
				if (skipFunction(F))
				return false;
				auto &TTI = getAnalysis<TargetTransformInfoWrapperPass>().getTTI(F);
				auto &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
				return optimizeDivRem(F, TTI, DT);
				}
				};
				}

				char DivRemPairsLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(DivRemPairsLegacyPass, "div-rem-pairs",
				"Hoist/decompose integer division and remainder", false,
				false)
				INITIALIZE_PASS_DEPENDENCY(DominatorTreeWrapperPass)
				INITIALIZE_PASS_END(DivRemPairsLegacyPass, "div-rem-pairs",
				"Hoist/decompose integer division and remainder", false,
				false)
				FunctionPass *llvm::createDivRemPairsPass() {
				return new DivRemPairsLegacyPass();
				}

				PreservedAnalyses DivRemPairsPass::run(Function &F,
				FunctionAnalysisManager &FAM) {
				TargetTransformInfo &TTI = FAM.getResult<TargetIRAnalysis>(F);
				DominatorTree &DT = FAM.getResult<DominatorTreeAnalysis>(F);
				if (!optimizeDivRem(F, TTI, DT))
				return PreservedAnalyses::all();
				// TODO: This pass just hoists/replaces math ops - all analyses are preserved?
				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				PA.preserve<GlobalsAA>();
				return PA;
				}

llvm/trunk/lib/Transforms/Scalar/Scalar.cpp

Show All 34 Lines	void llvm::initializeScalarOpts(PassRegistry &Registry) {
initializeADCELegacyPassPass(Registry);		initializeADCELegacyPassPass(Registry);
initializeBDCELegacyPassPass(Registry);		initializeBDCELegacyPassPass(Registry);
initializeAlignmentFromAssumptionsPass(Registry);		initializeAlignmentFromAssumptionsPass(Registry);
initializeConstantHoistingLegacyPassPass(Registry);		initializeConstantHoistingLegacyPassPass(Registry);
initializeConstantPropagationPass(Registry);		initializeConstantPropagationPass(Registry);
initializeCorrelatedValuePropagationPass(Registry);		initializeCorrelatedValuePropagationPass(Registry);
initializeDCELegacyPassPass(Registry);		initializeDCELegacyPassPass(Registry);
initializeDeadInstEliminationPass(Registry);		initializeDeadInstEliminationPass(Registry);
		initializeDivRemPairsLegacyPassPass(Registry);
initializeScalarizerPass(Registry);		initializeScalarizerPass(Registry);
initializeDSELegacyPassPass(Registry);		initializeDSELegacyPassPass(Registry);
initializeGuardWideningLegacyPassPass(Registry);		initializeGuardWideningLegacyPassPass(Registry);
initializeGVNLegacyPassPass(Registry);		initializeGVNLegacyPassPass(Registry);
initializeNewGVNLegacyPassPass(Registry);		initializeNewGVNLegacyPassPass(Registry);
initializeEarlyCSELegacyPassPass(Registry);		initializeEarlyCSELegacyPassPass(Registry);
initializeEarlyCSEMemSSALegacyPassPass(Registry);		initializeEarlyCSEMemSSALegacyPassPass(Registry);
initializeGVNHoistLegacyPassPass(Registry);		initializeGVNHoistLegacyPassPass(Registry);
▲ Show 20 Lines • Show All 233 Lines • Show Last 20 Lines

llvm/trunk/test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 199 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-O-NEXT: Running pass: LoopSinkPass			; CHECK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-O-NEXT: Running pass: InstSimplifierPass			; CHECK-O-NEXT: Running pass: InstSimplifierPass
				; CHECK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-O-NEXT: Running pass: ConstantMergePass			; CHECK-O-NEXT: Running pass: ConstantMergePass
	; CHECK-O-NEXT: Finished llvm::Module pass manager run.			; CHECK-O-NEXT: Finished llvm::Module pass manager run.
	; CHECK-O-NEXT: Finished llvm::Module pass manager run.			; CHECK-O-NEXT: Finished llvm::Module pass manager run.
	; CHECK-O-NEXT: Running pass: PrintModulePass			; CHECK-O-NEXT: Running pass: PrintModulePass
	;			;
	Show All 30 Lines

llvm/trunk/test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 187 Lines • ▼ Show 20 Lines
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopUnrollPass
	; CHECK-POSTLINK-O-NEXT: Running analysis: OuterAnalysisManagerProxy			; CHECK-POSTLINK-O-NEXT: Running analysis: OuterAnalysisManagerProxy
	; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass			; CHECK-POSTLINK-O-NEXT: Running pass: InstCombinePass
	; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-POSTLINK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass			; CHECK-POSTLINK-O-NEXT: Running pass: FunctionToLoopPassAdaptor<{{.*}}LICMPass
	; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass			; CHECK-POSTLINK-O-NEXT: Running pass: AlignmentFromAssumptionsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass			; CHECK-POSTLINK-O-NEXT: Running pass: LoopSinkPass
	; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifierPass			; CHECK-POSTLINK-O-NEXT: Running pass: InstSimplifierPass
				; CHECK-POSTLINK-O-NEXT: Running pass: DivRemPairsPass
	; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-POSTLINK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run.			; CHECK-POSTLINK-O-NEXT: Finished llvm::Function pass manager run.
	; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass			; CHECK-POSTLINK-O-NEXT: Running pass: GlobalDCEPass
	; CHECK-POSTLINK-O-NEXT: Running pass: ConstantMergePass			; CHECK-POSTLINK-O-NEXT: Running pass: ConstantMergePass
	; CHECK-POSTLINK-O-NEXT: Finished llvm::Module pass manager run.			; CHECK-POSTLINK-O-NEXT: Finished llvm::Module pass manager run.
	; CHECK-O-NEXT: Finished llvm::Module pass manager run.			; CHECK-O-NEXT: Finished llvm::Module pass manager run.
	; CHECK-PRELINK-O-NEXT: Running pass: NameAnonGlobalPass			; CHECK-PRELINK-O-NEXT: Running pass: NameAnonGlobalPass
	; CHECK-O-NEXT: Running pass: PrintModulePass			; CHECK-O-NEXT: Running pass: PrintModulePass
	Show All 31 Lines

llvm/trunk/test/Transforms/DivRemPairs/div-rem-pairs.ll

				; RUN: opt < %s -div-rem-pairs -S -mtriple=x86_64-unknown-unknown \| FileCheck %s --check-prefix=ALL --check-prefix=X86
				; RUN: opt < %s -div-rem-pairs -S -mtriple=powerpc64-unknown-unknown \| FileCheck %s --check-prefix=ALL --check-prefix=PPC

				declare void @foo(i32, i32)

				define void @decompose_illegal_srem_same_block(i32 %a, i32 %b) {
				; X86-LABEL: @decompose_illegal_srem_same_block(
				; X86-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; X86-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; X86-NEXT: call void @foo(i32 [[REM]], i32 [[DIV]])
				; X86-NEXT: ret void
				;
				; PPC-LABEL: @decompose_illegal_srem_same_block(
				; PPC-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; PPC-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], %b
				; PPC-NEXT: [[TMP2:%.*]] = sub i32 %a, [[TMP1]]
				; PPC-NEXT: call void @foo(i32 [[TMP2]], i32 [[DIV]])
				; PPC-NEXT: ret void
				;
				%rem = srem i32 %a, %b
				%div = sdiv i32 %a, %b
				call void @foo(i32 %rem, i32 %div)
				ret void
				}

				define void @decompose_illegal_urem_same_block(i32 %a, i32 %b) {
				; X86-LABEL: @decompose_illegal_urem_same_block(
				; X86-NEXT: [[DIV:%.*]] = udiv i32 %a, %b
				; X86-NEXT: [[REM:%.*]] = urem i32 %a, %b
				; X86-NEXT: call void @foo(i32 [[REM]], i32 [[DIV]])
				; X86-NEXT: ret void
				;
				; PPC-LABEL: @decompose_illegal_urem_same_block(
				; PPC-NEXT: [[DIV:%.*]] = udiv i32 %a, %b
				; PPC-NEXT: [[TMP1:%.*]] = mul i32 [[DIV]], %b
				; PPC-NEXT: [[TMP2:%.*]] = sub i32 %a, [[TMP1]]
				; PPC-NEXT: call void @foo(i32 [[TMP2]], i32 [[DIV]])
				; PPC-NEXT: ret void
				;
				%div = udiv i32 %a, %b
				%rem = urem i32 %a, %b
				call void @foo(i32 %rem, i32 %div)
				ret void
				}

				; Hoist and optionally decompose the sdiv because it's safe and free.
				; PR31028 - https://bugs.llvm.org/show_bug.cgi?id=31028

				define i32 @hoist_sdiv(i32 %a, i32 %b) {
				; X86-LABEL: @hoist_sdiv(
				; X86-NEXT: entry:
				; X86-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; X86-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; X86-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; X86-NEXT: br i1 [[CMP]], label %if, label %end
				; X86: if:
				; X86-NEXT: br label %end
				; X86: end:
				; X86-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
				; X86-NEXT: ret i32 [[RET]]
				;
				; PPC-LABEL: @hoist_sdiv(
				; PPC-NEXT: entry:
				; PPC-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; PPC-NEXT: [[TMP0:%.*]] = mul i32 [[DIV]], %b
				; PPC-NEXT: [[TMP1:%.*]] = sub i32 %a, [[TMP0]]
				; PPC-NEXT: [[CMP:%.*]] = icmp eq i32 [[TMP1]], 42
				; PPC-NEXT: br i1 [[CMP]], label %if, label %end
				; PPC: if:
				; PPC-NEXT: br label %end
				; PPC: end:
				; PPC-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
				; PPC-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = srem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%div = sdiv i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; Hoist and optionally decompose the udiv because it's safe and free.

				define i64 @hoist_udiv(i64 %a, i64 %b) {
				; X86-LABEL: @hoist_udiv(
				; X86-NEXT: entry:
				; X86-NEXT: [[REM:%.*]] = urem i64 %a, %b
				; X86-NEXT: [[DIV:%.*]] = udiv i64 %a, %b
				; X86-NEXT: [[CMP:%.*]] = icmp eq i64 [[REM]], 42
				; X86-NEXT: br i1 [[CMP]], label %if, label %end
				; X86: if:
				; X86-NEXT: br label %end
				; X86: end:
				; X86-NEXT: [[RET:%.*]] = phi i64 [ [[DIV]], %if ], [ 3, %entry ]
				; X86-NEXT: ret i64 [[RET]]
				;
				; PPC-LABEL: @hoist_udiv(
				; PPC-NEXT: entry:
				; PPC-NEXT: [[DIV:%.*]] = udiv i64 %a, %b
				; PPC-NEXT: [[TMP0:%.*]] = mul i64 [[DIV]], %b
				; PPC-NEXT: [[TMP1:%.*]] = sub i64 %a, [[TMP0]]
				; PPC-NEXT: [[CMP:%.*]] = icmp eq i64 [[TMP1]], 42
				; PPC-NEXT: br i1 [[CMP]], label %if, label %end
				; PPC: if:
				; PPC-NEXT: br label %end
				; PPC: end:
				; PPC-NEXT: [[RET:%.*]] = phi i64 [ [[DIV]], %if ], [ 3, %entry ]
				; PPC-NEXT: ret i64 [[RET]]
				;
				entry:
				%rem = urem i64 %a, %b
				%cmp = icmp eq i64 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%div = udiv i64 %a, %b
				br label %end

				end:
				%ret = phi i64 [ %div, %if ], [ 3, %entry ]
				ret i64 %ret
				}

				; Hoist the srem if it's safe and free, otherwise decompose it.

				define i16 @hoist_srem(i16 %a, i16 %b) {
				; X86-LABEL: @hoist_srem(
				; X86-NEXT: entry:
				; X86-NEXT: [[DIV:%.*]] = sdiv i16 %a, %b
				; X86-NEXT: [[REM:%.*]] = srem i16 %a, %b
				; X86-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42
				; X86-NEXT: br i1 [[CMP]], label %if, label %end
				; X86: if:
				; X86-NEXT: br label %end
				; X86: end:
				; X86-NEXT: [[RET:%.*]] = phi i16 [ [[REM]], %if ], [ 3, %entry ]
				; X86-NEXT: ret i16 [[RET]]
				;
				; PPC-LABEL: @hoist_srem(
				; PPC-NEXT: entry:
				; PPC-NEXT: [[DIV:%.*]] = sdiv i16 %a, %b
				; PPC-NEXT: [[CMP:%.*]] = icmp eq i16 [[DIV]], 42
				; PPC-NEXT: br i1 [[CMP]], label %if, label %end
				; PPC: if:
				; PPC-NEXT: [[TMP0:%.*]] = mul i16 [[DIV]], %b
				; PPC-NEXT: [[TMP1:%.*]] = sub i16 %a, [[TMP0]]
				; PPC-NEXT: br label %end
				; PPC: end:
				; PPC-NEXT: [[RET:%.*]] = phi i16 [ [[TMP1]], %if ], [ 3, %entry ]
				; PPC-NEXT: ret i16 [[RET]]
				;
				entry:
				%div = sdiv i16 %a, %b
				%cmp = icmp eq i16 %div, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem = srem i16 %a, %b
				br label %end

				end:
				%ret = phi i16 [ %rem, %if ], [ 3, %entry ]
				ret i16 %ret
				}

				; Hoist the urem if it's safe and free, otherwise decompose it.

				define i8 @hoist_urem(i8 %a, i8 %b) {
				; X86-LABEL: @hoist_urem(
				; X86-NEXT: entry:
				; X86-NEXT: [[DIV:%.*]] = udiv i8 %a, %b
				; X86-NEXT: [[REM:%.*]] = urem i8 %a, %b
				; X86-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42
				; X86-NEXT: br i1 [[CMP]], label %if, label %end
				; X86: if:
				; X86-NEXT: br label %end
				; X86: end:
				; X86-NEXT: [[RET:%.*]] = phi i8 [ [[REM]], %if ], [ 3, %entry ]
				; X86-NEXT: ret i8 [[RET]]
				;
				; PPC-LABEL: @hoist_urem(
				; PPC-NEXT: entry:
				; PPC-NEXT: [[DIV:%.*]] = udiv i8 %a, %b
				; PPC-NEXT: [[CMP:%.*]] = icmp eq i8 [[DIV]], 42
				; PPC-NEXT: br i1 [[CMP]], label %if, label %end
				; PPC: if:
				; PPC-NEXT: [[TMP0:%.*]] = mul i8 [[DIV]], %b
				; PPC-NEXT: [[TMP1:%.*]] = sub i8 %a, [[TMP0]]
				; PPC-NEXT: br label %end
				; PPC: end:
				; PPC-NEXT: [[RET:%.*]] = phi i8 [ [[TMP1]], %if ], [ 3, %entry ]
				; PPC-NEXT: ret i8 [[RET]]
				;
				entry:
				%div = udiv i8 %a, %b
				%cmp = icmp eq i8 %div, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem = urem i8 %a, %b
				br label %end

				end:
				%ret = phi i8 [ %rem, %if ], [ 3, %entry ]
				ret i8 %ret
				}

				; If the ops don't match, don't do anything: signedness.

				define i32 @dont_hoist_udiv(i32 %a, i32 %b) {
				; ALL-LABEL: @dont_hoist_udiv(
				; ALL-NEXT: entry:
				; ALL-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; ALL-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; ALL-NEXT: br i1 [[CMP]], label %if, label %end
				; ALL: if:
				; ALL-NEXT: [[DIV:%.*]] = udiv i32 %a, %b
				; ALL-NEXT: br label %end
				; ALL: end:
				; ALL-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
				; ALL-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = srem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%div = udiv i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; If the ops don't match, don't do anything: operation.

				define i32 @dont_hoist_srem(i32 %a, i32 %b) {
				; ALL-LABEL: @dont_hoist_srem(
				; ALL-NEXT: entry:
				; ALL-NEXT: [[REM:%.*]] = urem i32 %a, %b
				; ALL-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; ALL-NEXT: br i1 [[CMP]], label %if, label %end
				; ALL: if:
				; ALL-NEXT: [[REM2:%.*]] = srem i32 %a, %b
				; ALL-NEXT: br label %end
				; ALL: end:
				; ALL-NEXT: [[RET:%.*]] = phi i32 [ [[REM2]], %if ], [ 3, %entry ]
				; ALL-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = urem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem2 = srem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %rem2, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; If the ops don't match, don't do anything: operands.

				define i32 @dont_hoist_sdiv(i32 %a, i32 %b, i32 %c) {
				; ALL-LABEL: @dont_hoist_sdiv(
				; ALL-NEXT: entry:
				; ALL-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; ALL-NEXT: [[CMP:%.*]] = icmp eq i32 [[REM]], 42
				; ALL-NEXT: br i1 [[CMP]], label %if, label %end
				; ALL: if:
				; ALL-NEXT: [[DIV:%.*]] = sdiv i32 %a, %c
				; ALL-NEXT: br label %end
				; ALL: end:
				; ALL-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ 3, %entry ]
				; ALL-NEXT: ret i32 [[RET]]
				;
				entry:
				%rem = srem i32 %a, %b
				%cmp = icmp eq i32 %rem, 42
				br i1 %cmp, label %if, label %end

				if:
				%div = sdiv i32 %a, %c
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ 3, %entry ]
				ret i32 %ret
				}

				; If the target doesn't have a unified div/rem op for the type, decompose rem in-place to mul+sub.

				define i128 @dont_hoist_urem(i128 %a, i128 %b) {
				; ALL-LABEL: @dont_hoist_urem(
				; ALL-NEXT: entry:
				; ALL-NEXT: [[DIV:%.*]] = udiv i128 %a, %b
				; ALL-NEXT: [[CMP:%.*]] = icmp eq i128 [[DIV]], 42
				; ALL-NEXT: br i1 [[CMP]], label %if, label %end
				; ALL: if:
				; ALL-NEXT: [[TMP0:%.*]] = mul i128 [[DIV]], %b
				; ALL-NEXT: [[TMP1:%.*]] = sub i128 %a, [[TMP0]]
				; ALL-NEXT: br label %end
				; ALL: end:
				; ALL-NEXT: [[RET:%.*]] = phi i128 [ [[TMP1]], %if ], [ 3, %entry ]
				; ALL-NEXT: ret i128 [[RET]]
				;
				entry:
				%div = udiv i128 %a, %b
				%cmp = icmp eq i128 %div, 42
				br i1 %cmp, label %if, label %end

				if:
				%rem = urem i128 %a, %b
				br label %end

				end:
				%ret = phi i128 [ %rem, %if ], [ 3, %entry ]
				ret i128 %ret
				}

				; We don't hoist if one op does not dominate the other,
				; but we could hoist both ops to the common predecessor block?

				define i32 @no_domination(i1 %cmp, i32 %a, i32 %b) {
				; ALL-LABEL: @no_domination(
				; ALL-NEXT: entry:
				; ALL-NEXT: br i1 %cmp, label %if, label %else
				; ALL: if:
				; ALL-NEXT: [[DIV:%.*]] = sdiv i32 %a, %b
				; ALL-NEXT: br label %end
				; ALL: else:
				; ALL-NEXT: [[REM:%.*]] = srem i32 %a, %b
				; ALL-NEXT: br label %end
				; ALL: end:
				; ALL-NEXT: [[RET:%.*]] = phi i32 [ [[DIV]], %if ], [ [[REM]], %else ]
				; ALL-NEXT: ret i32 [[RET]]
				;
				entry:
				br i1 %cmp, label %if, label %else

				if:
				%div = sdiv i32 %a, %b
				br label %end

				else:
				%rem = srem i32 %a, %b
				br label %end

				end:
				%ret = phi i32 [ %div, %if ], [ %rem, %else ]
				ret i32 %ret
				}

This is an archive of the discontinued LLVM Phabricator instance.

[DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 114486

llvm/trunk/include/llvm/Analysis/TargetTransformInfo.h

llvm/trunk/include/llvm/Analysis/TargetTransformInfoImpl.h

llvm/trunk/include/llvm/InitializePasses.h

llvm/trunk/include/llvm/Transforms/Scalar.h

llvm/trunk/include/llvm/Transforms/Scalar/DivRemPairs.h

llvm/trunk/lib/Analysis/TargetTransformInfo.cpp

llvm/trunk/lib/Passes/PassBuilder.cpp

llvm/trunk/lib/Passes/PassRegistry.def

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.h

llvm/trunk/lib/Target/X86/X86TargetTransformInfo.cpp

llvm/trunk/lib/Transforms/IPO/PassManagerBuilder.cpp

llvm/trunk/lib/Transforms/Scalar/CMakeLists.txt

llvm/trunk/lib/Transforms/Scalar/DivRemPairs.cpp

llvm/trunk/lib/Transforms/Scalar/Scalar.cpp

llvm/trunk/test/Other/new-pm-defaults.ll

llvm/trunk/test/Other/new-pm-thinlto-defaults.ll

llvm/trunk/test/Transforms/DivRemPairs/div-rem-pairs.ll

[DivRemHoist] add a pass to move div/rem pairs into the same block (PR31028)
ClosedPublic