This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
3/4
Passes.rst
-
include/llvm/
-
llvm/
1/1
InitializePasses.h
-
Transforms/
1/1
Scalar.h
-
lib/
-
LTO/
1/1
LLVMBuild.txt
-
Passes/
1/1
LLVMBuild.txt
-
PassBuilder.cpp
-
PassRegistry.def
-
Transforms/
-
AggressiveInstCombine/
4/4
AggressiveInstCombine.cpp
3/3
AggressiveInstCombineInternal.h
-
CMakeLists.txt
-
LLVMBuild.txt
17/30
TruncInstCombine.cpp
-
CMakeLists.txt
-
IPO/
-
LLVMBuild.txt
1/1
PassManagerBuilder.cpp
-
LLVMBuild.txt
-
Scalar/
-
LLVMBuild.txt
-
test/
-
Other/
-
new-pm-defaults.ll
-
new-pm-lto-defaults.ll
-
new-pm-thinlto-defaults.ll
-
Transforms/AggressiveInstCombine/
-
AggressiveInstCombine/
2
trunc_multi_uses.ll
-
tools/opt/
-
opt/
-
opt.cpp

Differential D38313

[InstCombine] Introducing Aggressive Instruction Combine pass
ClosedPublic

Authored by aaboud on Sep 27 2017, 5:19 AM.

Download Raw Diff

Details

Reviewers

spatel
craig.topper
majnemer
zvi
mzolotukhin

Commits

rGe4453233d787: [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive…
rL323321: [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive…

Summary

This new approach is a replacement for D37195.

Motivation:
InstCombine algorithm runs with high complexity (as it keeps running till there is no change to the code), thus it requires that each piece inside the outer loop, have a very small complexity (preferable O(1)).

Problem
There are instruction patterns that costs more than O(1) to identify and modify, thus it cannot be added to the InstCombine algorithm.

Solution
AggressiveInstCombine pass, which will run separately from InstCombine pass, and can perform expression pattern optimizations, which might take each up to O(n) complexity, where n is number of instructions in the function.

I introduced in this patch:

The AggressiveInstCombiner class is the main pass that will run all expression pattern optimizations.
The TruncInstCombine class, first added expression pattern optimization.

TruncInstCombine currently supports only simple instructions such as: add, sub, and, or, xor and mul.
The main difference between this and the canEvaluateTruncated from InstCombine is that this one supports:

Instructions with multi-use, as long as all are dominated by the truncated instruction.
Truncating to different width than the original trunc instruction requires. (this is useful if we reduce the expression width, even if we are not eliminating any instruction).

Next I will add to this TruncInstCombine class the following support, each in a separate patch:

select, shufflevector, extractelement, insertelement
udiv, urem
shl, lshr, ashr
phi node (and loop handling)

Diff Detail

Build Status

Buildable 12315
Build 12315: arc lint + arc unit

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Herald added a subscriber: mgorny. · View Herald TranscriptSep 27 2017, 5:19 AM

Fixed "no new line at end of file" issue.

aaboud mentioned this in D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..Sep 27 2017, 6:23 AM

mcrosier added a subscriber: mcrosier.Sep 27 2017, 6:35 AM

nlopes added a subscriber: nlopes.Sep 28 2017, 6:45 AM

Do you have tests for "Truncating to different width than the original trunc instruction requires. (this is useful if we reduce the expression width, even if we are not eliminating any instruction)."

lib/Transforms/InstCombine/InstCombinePatterns.cpp
215 ↗	(On Diff #116798)	smaller*
223 ↗	(On Diff #116798)	infinit*
252 ↗	(On Diff #116798)	destination*
255 ↗	(On Diff #116798)	Does this call isLegalInteger using the scalar bit width for vectors? Not sure that's valid.
259 ↗	(On Diff #116798)	destination*
397 ↗	(On Diff #116798)	dominates* "we" looks unnecessarry

Hi Amjad,

I didn't do a thorough drill down of all the changes (I'll leave that to Craig and Sanjay), but skimmed through it. The design seems reasonable, as long as others are ok with the instcombine plugin.
I can't tell all the effects this will have, but take care that the zext->(operations...)->trunc optimization doesn't remove a zext on a load that feed into an operation such that we turn a "movz[bw] mem -> reg" into "mov[bw] mem -> [bw]reg", since this can introduce extra merge/false dependence hits on most newer Intel architectures.

Zia.

lib/Transforms/InstCombine/InstCombinePatterns.cpp
146 ↗	(On Diff #116798)	Are you planning on doing these peepholes here? If not, is there any other reason for leaving these cases in this switch?
180 ↗	(On Diff #116798)	Same as above. Any reason for not breaking for trunc/zext/sext?

Thanks Craig and Zia for the review and sorry for the late answer.
Please, see answers below.

In D38313#891163, @craig.topper wrote:

Do you have tests for "Truncating to different width than the original trunc instruction requires. (this is useful if we reduce the expression width, even if we are not eliminating any instruction)."

I do have tests, but this is not possible yet with this patch, we need the shift/udiv/urem instructions to get this case. I will introduce that in next patches.
In this patch, I just implemented the infrastructure for this optimization.

In D38313#897527, @zansari wrote:

Hi Amjad,

I didn't do a thorough drill down of all the changes (I'll leave that to Craig and Sanjay), but skimmed through it. The design seems reasonable, as long as others are ok with the instcombine plugin.
I can't tell all the effects this will have, but take care that the zext->(operations...)->trunc optimization doesn't remove a zext on a load that feed into an operation such that we turn a "movz[bw] mem -> reg" into "mov[bw] mem -> [bw]reg", since this can introduce extra merge/false dependence hits on most newer Intel architectures.

Zia.

I believe that we have passes in the codegen to make sure we will not suffer from these false dependencies, right?
I do not believe that instcombine should look that far to recognize such cases.

lib/Transforms/InstCombine/InstCombinePatterns.cpp
146 ↗	(On Diff #116798)	I do need to handle these cases here as I am calling "getRelevantOperands()" function from "getMinBitWidth()" function for all supported cases (without a switch).
180 ↗	(On Diff #116798)	I can add the break here, I agree that it makes code faster.
255 ↗	(On Diff #116798)	Can you explain, what you mean by "that's not valid"? Are you referring to the algorithm? Are you concerned of cases where a scalar type is legal (e.g. i32), while a vector type is not (e.g. <8xi32>), or the opposite direction? I believe that my algorithm should not mind about the number of elements in the type, only about the width. The reason for this check, is to keep the old case, where we will not reduce a legal to non-legal type. Do you think that we should do a more accurate check?

Addressed Craig and Zia comments.

craig.topper added inline comments.Oct 16 2017, 4:39 PM

lib/Transforms/InstCombine/InstCombinePatterns.cpp
255 ↗	(On Diff #116798)	I guess my point was just that the legality of the scalar type is unrelated to the legality of the vector type. In x86 for example. i64 isn't legal in 32-bit mode, but v2i64 is. If I remember right the existing code doesn't even call isLegalInteger for vector types? I'd check but someone is hammering the server I normally use.

aaboud added inline comments.Oct 18 2017, 8:23 AM

lib/Transforms/InstCombine/InstCombinePatterns.cpp

255 ↗

(On Diff #116798)

I agree that I have a logical bug here.
I was trying to keep the same semantic of original code:

if ((DestTy->isVectorTy() || shouldChangeType(SrcTy, DestTy)) &&
      canEvaluateTruncated(Src, DestTy, *this, &CI)) {

I guess, I need to ignore vector types. (as in above code) and fix the logic to this:

if (MinBitWidth > ValidBitWidth) {
  Type *Ty = DL.getSmallestLegalIntType(DstTy->getContext(), MinBitWidth);
  if (!Ty)
    return nullptr;
  // update minimum bit-width with the new destination type bit-width.
  MinBitWidth = Ty->getScalarSizeInBits();
} else { // MinBitWidth == ValidBitWidth
  if (!DestTy->isVectorTy() && !shouldChangeType(SrcTy, DestTy))
    return nullptr;
}
Type *NewDstSclTy = IntegerType::get(DstTy->getContext(), MinBitWidth);

If I'm seeing this correctly, it's an independent pass within InstCombine. It sits outside InstCombine's iteration loop, so it doesn't interact with the rest of the pass. What is the advantage of this approach vs. making a standalone pass?

In D38313#903090, @spatel wrote:

If I'm seeing this correctly, it's an independent pass within InstCombine. It sits outside InstCombine's iteration loop, so it doesn't interact with the rest of the pass. What is the advantage of this approach vs. making a standalone pass?

Hi Sanjay,
This code, which I am suggested below, intends to replace (and improve) the "canEvaluateTruncated" from the current instCombine implementation.
The logic of this code is similar to what instcombine does, but different in the way of implementation.
Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.

Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.

Do you think still that it should be in a separate pass?

aaboud added a subscriber: zvi.Oct 25 2017, 2:25 AM

Updated patch according to Craig comment. (Fixed minor logical bug)

In D38313#906324, @aaboud wrote:

Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.

Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.

I agree that testing must be done, but I don't see how that makes it obvious that this should be part of instcombine? If you're concerned that something else in instcombine will inhibit or invert this transform, you could add tests under test/Transforms/PhaseOrdering/ to make sure that doesn't happen. I think you've done the hard part (the code itself) already. :)

The major disadvantage of being in instcombine is that this code will be running 5-6 times in a typical pipeline when it probably doesn't need to.

Do you think still that it should be in a separate pass?

I don't know enough to say. I'm stating a concern based on feedback I've gotten about the size and cost of InstCombine (eg, http://lists.llvm.org/pipermail/llvm-dev/2017-May/113184.html , http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html ). I'll let more experienced developers (cc'ing @nlopes @hfinkel @regehr ... ) decide what's the right way forward.

Also, I know that others are exploring ways to auto-regenerate some portion of InstCombine for greater efficiency, so it might be worth considering if/how that goal interacts with a large patch like this one.

majnemer added inline comments.Oct 25 2017, 7:54 AM

lib/Transforms/InstCombine/InstCombineInternal.h
782 ↗	(On Diff #120219)	Ditto.
lib/Transforms/InstCombine/InstCombinePatterns.cpp
90 ↗	(On Diff #120219)	default;

Answered David's comment.

In D38313#906511, @spatel wrote:

In D38313#906324, @aaboud wrote:

Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.

Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.

I agree that testing must be done, but I don't see how that makes it obvious that this should be part of instcombine? If you're concerned that something else in instcombine will inhibit or invert this transform, you could add tests under test/Transforms/PhaseOrdering/ to make sure that doesn't happen. I think you've done the hard part (the code itself) already. :)

The major disadvantage of being in instcombine is that this code will be running 5-6 times in a typical pipeline when it probably doesn't need to.

Is that a bad thing to run this code 5-6 times? it has O(n) complexity where n = number of instructions in the function.
Would not it catch more patterns once we run other optimizations?
I need this code to run at least twice, one time on regular compilation and another time as part of LTO optimization.
Would it be better if I move the code to a new pass "PatternInstructionCombinePass", but run that new pass from "addInstructionCombiningPass"? It will still run 5-6 times!

Do you think still that it should be in a separate pass?

I don't know enough to say. I'm stating a concern based on feedback I've gotten about the size and cost of InstCombine (eg, http://lists.llvm.org/pipermail/llvm-dev/2017-May/113184.html , http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html ). I'll let more experienced developers (cc'ing @nlopes @hfinkel @regehr ... ) decide what's the right way forward.

Also, I know that others are exploring ways to auto-regenerate some portion of InstCombine for greater efficiency, so it might be worth considering if/how that goal interacts with a large patch like this one.

Notice that this code is part of InstCombine pass, but it is totally separated, I do not see why any change to InstCombine will affect this, unless you think we can auto-generate this optimization!

To summarize, I really do not mind to move it to separate pass, but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.

Thanks,
Amjad

In D38313#906667, @aaboud wrote:

To summarize, I really do not mind to move it to separate pass, but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.

As I said, I'm deferring to others on the way forward, so if everyone else thinks this is good, then I'm not objecting. Others have looked at the code closer than me, so I'll let them provide more feedback and/or approval.

In D38313#907218, @spatel wrote:

In D38313#906667, @aaboud wrote:

To summarize, I really do not mind to move it to separate pass, but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.

As I said, I'm deferring to others on the way forward, so if everyone else thinks this is good, then I'm not objecting. Others have looked at the code closer than me, so I'll let them provide more feedback and/or approval.

Thanks Sanjay,
I appreciate your help.
I will be waiting a little more for others to comment on this patch.

Amjad, some questions about where do we want this work to evolve to.

Looking at the following example:

define i16 @foo(i8 %B, i16 %A, i1* %pbit) {
           %zext = zext i8 %B to i32
         %mul = mul i32 %zext, %zext
       %LSB = and i32 %mul, 255 ; <----------- First use
     %cmp = icmp ne i32 %LSB, 0
   store i1 %cmp, i1* %pbit
       %sext = sext i16 %A to i32
     %and = and i32 %mul, %sext ; <---------- Second use
   %trunc = trunc i32 %and to i16
   ret i16 %trunc
 }

The indentation should help with seeing the two branches in the expression DAG. The 'mul' is used by both 'and' instructions.
The most 'type-shrunken' form with no duplications should be this:

define i16 @foo(i8 %B, i16 %A, i1* %pbit) {

%zext = zext i8 %B to i16
%mul = mul i16 %zext, %zext
%LSB = and i16 %mul, 255
%cmp = icmp ne i16 %LSB, 0
store i1 %cmp, i1* %pbit, align 1
%and = and i16 %mul, %A
ret i16 %and

}

I assume that this patch will not cover this case because the bottom-up traversal starting from the 'trunc' will only visit one branch of the two. The bail-out condition

if (UI != CurrentTruncInst && !InstInfoMap.count(UI))
          return nullptr;

in getBestTruncatedType() will fire because one of the 'mul' users will not be mapped in the expressions DAG. If, for example, the expression DAG were constructed by traversing both defs and uses, it would have covered the entire function in the above example, and there would be sufficient information to deduce that both branches of the expression DAG can be shrunk.
Does this make sense?

lib/Transforms/InstCombine/InstCombinePatterns.cpp
62 ↗	(On Diff #120258)	Didn't see any actual uses of AC and DT. Will they be used in a follow-up patch? Even if yes, better remove to avoid breaking -Werror builds
139 ↗	(On Diff #120258)	Instead of populating a container, considering returning a pair of begin,end iterators to the operands of interest (or a op_range). Of course, this will only work if the operands of interest are consecutive.
260 ↗	(On Diff #120258)	Please add a comment explaining the considerations for bailing out when MinBitWidth == ValidBitWidth
270 ↗	(On Diff #120258)	I am commenting about this because it got me a bit confused. I think the correct term would be post-dominated or "users that dominate the truncate instruction".
271 ↗	(On Diff #120258)	*expression
311 ↗	(On Diff #120258)	Better call .lookup() rather than operator[] to avoid side effects in asserts
320 ↗	(On Diff #120258)	Better call .lookup() rather than operator[] to avoid side effects in asserts 312
358 ↗	(On Diff #120258)	Consider moving this 'if' block under the appropriate case statement above.
359 ↗	(On Diff #120258)	STLExtras.h can help with saving a few bytes of code :) auto Entry = find(Worklist, I);

Thanks Zvi for the comments.
I fixed most of them and will upload a new patch soon.

lib/Transforms/InstCombine/InstCombinePatterns.cpp
62 ↗	(On Diff #120258)	Indeed, I will be using them in next patches. I do not think there is a warning/error issue with unused class members, but I will remove them from this patch and add them later when they are actually used.
139 ↗	(On Diff #120258)	It will not work for the PHINode instruction.

Addressed Zvi's comments.

In D38313#906667, @aaboud wrote:

In D38313#906511, @spatel wrote:

In D38313#906324, @aaboud wrote:

Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.

Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.

I agree that testing must be done, but I don't see how that makes it obvious that this should be part of instcombine? If you're concerned that something else in instcombine will inhibit or invert this transform, you could add tests under test/Transforms/PhaseOrdering/ to make sure that doesn't happen. I think you've done the hard part (the code itself) already. :)

The major disadvantage of being in instcombine is that this code will be running 5-6 times in a typical pipeline when it probably doesn't need to.

Is that a bad thing to run this code 5-6 times? it has O(n) complexity where n = number of instructions in the function.
Would not it catch more patterns once we run other optimizations?
I need this code to run at least twice, one time on regular compilation and another time as part of LTO optimization.
Would it be better if I move the code to a new pass "PatternInstructionCombinePass", but run that new pass from "addInstructionCombiningPass"? It will still run 5-6 times!

Do you think still that it should be in a separate pass?

I don't know enough to say. I'm stating a concern based on feedback I've gotten about the size and cost of InstCombine (eg, http://lists.llvm.org/pipermail/llvm-dev/2017-May/113184.html , http://lists.llvm.org/pipermail/llvm-dev/2017-September/117151.html ). I'll let more experienced developers (cc'ing @nlopes @hfinkel @regehr ... ) decide what's the right way forward.

Also, I know that others are exploring ways to auto-regenerate some portion of InstCombine for greater efficiency, so it might be worth considering if/how that goal interacts with a large patch like this one.

Notice that this code is part of InstCombine pass, but it is totally separated, I do not see why any change to InstCombine will affect this, unless you think we can auto-generate this optimization!

To summarize, I really do not mind to move it to separate pass,

I think that this should be a separate pass. Everything in InstCombine is currently part of InstCombine's fixed-point iteration scheme. Having some things in InstCombine that are, and some that aren't, seems unnecessarily confusing. If this essentially runs as a separate pass, then I think we should just make it one, and then we can schedule it as we see fit.

Also, I'll note that this optimization seems very similar to PPCTargetLowering::DAGCombineTruncBoolExt, which optimized cases like trunc(binary-ops(binary-ops(zext(x), zext(y)), ...) and perhaps also PPCTargetLowering::DAGCombineExtBoolTrunc, which optimizes cases like zext(binary-ops(binary-ops(trunc(x), trunc(y)), ...). Those implementation are not recursive, but use a work queue, and I suggest doing the same here.

but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.

Thanks,
Amjad

I know you decided to move this to a new pass, but here's a couple of more comments that will be relevant.

test/Transforms/InstCombine/trunc_pattern.ll
13 ↗	(On Diff #120419)	dominated -> post-dominated
18 ↗	(On Diff #120419)	May a nitpick of mine, but i find it easier to follow the check's when they arer right above the related sequence of instructions. Another option is to be break down to multiple functions.

Separated the implementation from InstCombine pass and introduced a new pass called AggressiveInstCombine, which is called only twice (compared InstCombine, which is called ~6 times), one time as part of function simplification passes and second time as part of LTO optimization passes.

Herald added a subscriber: mehdi_amini. · View Herald TranscriptOct 30 2017, 7:12 AM

hfinkel added inline comments.Oct 30 2017, 7:02 PM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
77	You should use a worklist here, not recursion. Then they'll be no need for the depth limit (or it can be very large).
114	You can mention specific things here. Comments like "TODO: more work" are not helpful. You have a list in your patch description: select, shufflevector, extractelement, insertelement udiv, urem shl, lshr, ashr phi node (and loop handling)
158	If we have a DAG of instructions with multiple trunc outputs, we'll end up walking the DAG once per trunc output?
182	If your goal is only to produce legal integer widths (which might miss some cases for vectorization), then I think that you should just fold this information into getMinBitWidth so that getMinBitWidth returns the minimum legal bit width.
282	Please write actual messages in llvm_unreachable. Perhaps something like: llvm_unreachable("Unhandled instruction");

Thanks Hal for the review.
I will update the patch with the changes you ask for.
Also, see one answer below.

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
158	That is is correct, and we will not be able to perform the transformation for any of these trunc instructions. This means that the complexity of this pass is at most O(n^2), in the worst case where n/2 is the DAG nodes and n/2 are trunc instructions that are using/truncating the DAG result. Should we worry about this case? Can we solve it?

Addressed Hal's comments.

Remove all recursive functions and replaced with Worklist + Stack containers. Note, this is needed for handling loops (which will be added with the PHINode patch).
Improved the compile time performance of the pass, by implementing an early exist for multi-used instructions that are not post-dominated by the TruncInst.

Some more minor comments

docs/Passes.rst
19	For the sake of consistency with other title, please add more dashes for alignment with title.
21	patterns expressions
24	*reduces the width of expressions
27	Maybe also say that the pattern-scan may cover the entire functions as opposed to the locality-limited instcombine patterns?
include/llvm/InitializePasses.h
2	These declarations are not perfectly sorted, but still maybe try to move this up.
include/llvm/Transforms/Scalar.h
8	expressions patterns
lib/LTO/LLVMBuild.txt
2	Please preserve sorted ordering
lib/Passes/LLVMBuild.txt
1–22	Please preserve sorted ordering
lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
32	Can you please rephrase this so that it is obvious that the pass can be run more than once and that the internal mechanism limits pattern matchers to run once? Maybe something like this It differs than the Instcombine in that each pattern combiner is run only once as opposed to instcombine's multi-iteration ...
55	Is there an importance of order to the calls? If not, maybe group the set/addPreserve* calls and addRequired call to two group separated by a newline. IMHO it's more readable.
lib/Transforms/AggressiveInstCombine/AggressiveInstCombineInternal.h
22	Move two lines up
70	post-dominated
lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
41	+1 for preserving DebugInfo. Is there other metadata that may need to be copied? I can't think of anything in particular, just wanted to raise the possibility.
45	Is it possible to reuse llvm::ReplaceInstWithInst instead of doing the low-level work explicitly?
55	IIRC, in one of the previous revisions of this patch there was a comment explaining why these cast instructions are skipped? Can you please revive it or add a new one?

zvi added inline comments.Nov 2 2017, 7:07 AM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
65	Could we avoid pushing constants and maybe even generalize to avoid values that are not instructions? At least for these cases it may be ok, not for others, such as divide where we need to be more cautious
203	At this point a mappting for IOp must exist in InstInfoMap, right? Then please use lookup() or find(). Also better avoid searching for same key more than once.
210	lookup()
228	IMO this would be more readable and guarantee that code changes won't lead to uses of stale values of MinBitWidth: MinBitWidth = OrigBitwidth; and drop the return
353	I may have missed this, but why defer insertion of instruction to the BB to this point? And consider using the IRBuilder instead of calling ::Create* above and below.

Thanks Zvi for the comments.
I will upload a new patch with most of the comments fixed.
See few answers below.

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
55	Not sure if there is any importance to the order. I just did what InstructionCombiningPass does!
lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
41	This is what InstCombine preserves.
45	llvm::ReplaceInstWithInst also deletes the old instructions, which we are not ready to delete at this point.
65	I prefer not to complicate this function, it should return operands that are related to the optimization, the caller should check each relevant operand if it is a constant or instruction, there will be cases where even a constant need to be evaluated and will not skipped immediately.

Addressed Zvi's Comments.

Minor fix, forgot to use IRBuilder in one case in the previous patch.

Thanks, Amjad! This patch LGTM, but i think it would be best to wait for an LGTM from one of the assigned reviewers.

Minor typo update.

This is the last version, where I answered all the previous comments.
Please, let me know if you have any final comment or if I can go ahead and commit the patch.

Thanks

Commandeering this patch while Amjad is away on a few weeks of vacation.
Also sending a friendly ping to the reviewers. AFAIK, all comments were addressed as of the latest revision of this patch. Please let me know if i missed anything. Thanks.

craig.topper added inline comments.Nov 14 2017, 8:59 AM

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp
63	What about the new pass manager?
lib/Transforms/AggressiveInstCombine/AggressiveInstCombineInternal.h
90	This sentence reads funny
lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
81	Can you consistently use auto with dyn_cast throughout this patch.
176	LLVM style preferes "auto *IOp". We don't want auto to hide the fact that its a pointer. Please scrub the whole patch for this.
207	In the case of vectors, is this using legal scalar integer types to constrain the vector element type? I'm not sure if that's the right behavior. The legal scalar types don't necessarily imply anything about vector types. I think we generally try to avoid creating vector types that didn't appear in the IR.
211	Isn't this MinBitWidth == TruncBitWidth?
lib/Transforms/IPO/PassManagerBuilder.cpp
3	What about the new pass builder for the new pass manager?

efriedma added a subscriber: efriedma.Nov 14 2017, 2:08 PM

zvi added inline comments.Nov 16 2017, 4:45 AM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
207	That' s a good point. Will look into this.
211	Your observation is correct but the comment is also correct and it explains something that may not be obvious. At this point, we completed visiting the expression tree starting from the truncate's operand and found 'MinBitWidth' by taking the max of all min-bit-width requirements of the predecessors. Now, since this is a truncate instruction, and by construction of the expression tree it follows that the computed MinBitWidth can never be less than the truncate's return types's size in bits or in other words MinBitWidth >= TruncBitWidth. The case of MinBitWidth > TruncBitWidth is hand;ed in the then block just above, and what remains in to handle the MinBitWidth == TruncBitWidth in the else block here. I can put this in a comment if you think it would be helpful.

Address some of Craig's recent comments.

Harbormaster completed remote builds in B12230: Diff 123159.Nov 16 2017, 4:47 AM

zvi marked 3 inline comments as done.Nov 16 2017, 4:49 AM

zvi added subscribers: guyblank, lsaba.Nov 16 2017, 6:31 AM

craig.topper added inline comments.Nov 16 2017, 10:50 PM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
211	I was just saying that it should say TruncBitWidth not ValidBitWidth right?

RKSimon added a subscriber: RKSimon.Nov 18 2017, 6:14 AM

zvi added inline comments.Nov 19 2017, 12:48 AM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
211	Ah, right :)

Address the last of Craig's comments:

Thanks, @lsaba, for porting the pass to the new PassManager.
Removed shrinkage of vector types until we sort out if it is generally allowed to shrink element types of vector operations.
Some minor fixes to comments.

Herald added a subscriber: eraman. · View Herald TranscriptNov 19 2017, 12:15 PM

zvi marked 10 inline comments as done.Nov 19 2017, 12:19 PM

Rebase on ToT. NFC in this revision.

Harbormaster completed remote builds in B12315: Diff 123510.Nov 19 2017, 12:24 PM

ping

Two comments on the trunc thing:

Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.

I'm worried about this because the DAG *loves* to eliminate """redundant""" truncates and extensions, even if they're both marked as free. I've accidentally triggered infinite loops many times when trying to trick the DAG into emitting code that keeps intermediate variables small, an extreme example being something like this:

; pseudo-asm
; R1 = *b + (*a & 15);
; R2 = *c + (*a >> 16) & 15;
load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R0H, R0, 16
and.16 R0L, R0L, 15
and.16 R0H, R0H, 15
add.32 R1, R1, R0L
add.32 R2, R2, R0H

The DAG will usually try to turn this into this:

load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R3, R0, 16
and.32 R0, R0, 15
and.32 R3, R3, 15
add.32 R1, R1, R0
add.32 R2, R2, R3

this is just a hypothetical example but in general this makes me worry from past attempts at experimentation in this realm.

I'm really worried that the compile time hit of this for LTO will be non-negligible. Do you have numbers?

Add missing AggressiveInstCombine.h and fix missing 'opt' dependency. Thanks, @lsaba, for noticing.

In D38313#936422, @escha wrote:

Two comments on the trunc thing:

Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.

Sometimes it's LLVM, and sometimes it's the frontend that is required to extend small typed values before performing operations.

I'm worried about this because the DAG *loves* to eliminate """redundant""" truncates and extensions, even if they're both marked as free. I've accidentally triggered infinite loops many times when trying to trick the DAG into emitting code that keeps intermediate variables small, an extreme example being something like this:
; pseudo-asm
; R1 = *b + (*a & 15);
; R2 = *c + (*a >> 16) & 15;
load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R0H, R0, 16
and.16 R0L, R0L, 15
and.16 R0H, R0H, 15
add.32 R1, R1, R0L
add.32 R2, R2, R0H
The DAG will usually try to turn this into this:
load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R3, R0, 16
and.32 R0, R0, 15
and.32 R3, R3, 15
add.32 R1, R1, R0
add.32 R2, R2, R3
this is just a hypothetical example but in general this makes me worry from past attempts at experimentation in this realm.

Not sure I fully understand the concern of this patch, but if the problem is root caused to Instruction Selection, shouldn't we fix it there? If DAGCombiner's elimination of free truncates/extensions is an issue, have you considered predicating the specific combines with TLI hooks?

In D38313#936670, @davide wrote:

I'm really worried that the compile time hit of this for LTO will be non-negligible. Do you have numbers?

Will follow-up on this.

In D38313#937267, @zvi wrote:
In D38313#936422, @escha wrote:

Two comments on the trunc thing:

Thank you!!! As a GPU target maintainer, one of my main frustrations is how much LLVM *loves* to generate code that is needlessly too wide when smaller would do. We mostly have avoided this problem due to being float-heavy, but as integer code becomes more important, I absolutely love any chance I can get to reduce 32-bit to 16-bit and save register space accordingly.

Sometimes it's LLVM, and sometimes it's the frontend that is required to extend small typed values before performing operations.
I'm worried about this because the DAG *loves* to eliminate """redundant""" truncates and extensions, even if they're both marked as free. I've accidentally triggered infinite loops many times when trying to trick the DAG into emitting code that keeps intermediate variables small, an extreme example being something like this:
; pseudo-asm
; R1 = *b + (*a & 15);
; R2 = *c + (*a >> 16) & 15;
load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R0H, R0, 16
and.16 R0L, R0L, 15
and.16 R0H, R0H, 15
add.32 R1, R1, R0L
add.32 R2, R2, R0H
The DAG will usually try to turn this into this:
load.32 R0, [a]
load.32 R1, [b]
load.32 R2, [c]
shr.32 R3, R0, 16
and.32 R0, R0, 15
and.32 R3, R3, 15
add.32 R1, R1, R0
add.32 R2, R2, R3
this is just a hypothetical example but in general this makes me worry from past attempts at experimentation in this realm.
Not sure I fully understand the concern of this patch, but if the problem is root caused to Instruction Selection, shouldn't we fix it there? If DAGCombiner's elimination of free truncates/extensions is an issue, have you considered predicating the specific combines with TLI hooks?

There *are* TLI hooks; they're just not as widely used in the DAG as they could be.

I'm warning you with regard to this patch because the DAG may inadvertently undo a lot of the optimizations you're doing here. This isn't an objection, just something that might be worth looking at later given past experiences in trying to do similar optimizations.

In D38313#937269, @zvi wrote:

In D38313#936670, @davide wrote:

I'm really worried that the compile time hit of this for LTO will be non-negligible. Do you have numbers?

Will follow-up on this.

Measured CTMark and internal tests and was not able to observe significant compile time changes with -flto. Below are the results for CTMark:

Workload	ToT: stdev/average of 10 runs [%]	This patch: stdev/average of 10 runs [%]	Average compile-time speedup of this patch over ToT (higher is better for this patch)
7zip	0.19%	0.19%	0.999
Bullet	0.30%	0.37%	0.998
ClamAV	0.39%	0.19%	1.000
SPASS	0.52%	0.33%	1.000
consumer-typeset	0.27%	0.36%	0.999
kimwitu++	0.45%	0.49%	0.998
lencod	0.20%	0.51%	1.001
mafft	0.63%	0.29%	1.006
sqlite3	0.70%	0.82%	1.002
tramp3d-v4	1.23%	1.78%	0.990

spatel mentioned this in D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.Dec 5 2017, 7:19 AM

Ping

craig.topper added inline comments.Dec 12 2017, 10:24 PM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
206	I'm not sure you've addressed my vector concerns. The first part of this 'if' would create a new type for vectors by using getSmallestLegalIntType.
218	I think the !Vector check that was here was correct previously. We don't do isLegalnteger checks on the scalar types of vectors. For vectors we assume that if the type was present in the IR, the transform is fine. In this block, TruncWidth == MinBitWidth so the type existed in the original IR. My vector concerns were about the block above where we create new a new type.

Thanks Zvi for addressing all comments and questions while I am away.
Craig, please, see answers for your questions inlined below.

Thanks,
Amjad

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp
206	I think that in the "else" part, the one that I kept the same behavior as the original instcombine code, we might end up creating a vector type that was not in the IR, as for vector types we do not check for type legality. So, why in this case we should behave differently? Regarding the scalar check, it might be redundant, but not always, because even if the "trunc" instruction is performed on vector type, the evaluated expression might contain scalar operations (due to the "insertelement" instruction, which will be supported in next few patches). Furthermore, my assumption is that codegen legalizer will promote the illegal vector type back to the original type (or to a smaller one), in both cases we will not get worse code than the one we started with! Is that assumption too optimistic?
218	I agree with Craig, need to change back to: if (!DstTy->isVectorTy() && FromLegal && !ToLegal)

aaboud added inline comments.Dec 19 2017, 4:47 AM

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

206

Just to emphasize I am adding two examples.

Case 1 (MinBitWidth == TruncBitWidth):

%A1 = zext <2 x i32> %X to <2 x i64>
%B1 = mul <2 x i64> %A1, %A1
%C1 = extractelement <2 x i64> %B1, i32 0
%D1 = extractelement <2 x i64> %B1, i32 1
%E1 = add i64 %C1, %D1
%T = trunc i64 %E1 to i16
  =>
%A2 = trunc <2 x i32> %X to <2 x i16>
%B2 = mul <2 x i16> %X, %X
%C2 = extractelement <2 x i16> %B2, i32 0
%D2 = extractelement <2 x i16> %B2, i32 1
%T = add i16 %C1, %D1

Case 2 (MinBitWidth > TruncBitWidth):

%A1 = zext <2 x i32> %X to <2 x i64>
%B1 = lshr <2 x i64> %A1, <i64 8, i64 8>
%C1 = mul <2 x i64> %B1, %B1
%T = trunc <2 x i64> %C1 to <2 x i8>
  =>
%A2 = trunc <2 x i32> %X to <2 x i16>
%B2 = lshr <2 x i16> %A2, <i32 8, i32 8>
%C2 = mul <2 x i16> %B2, %B2
%T = trunc <2 x i16> %C2 to <2 x i8>

Notice that in both cases the "new" vector type (<2 x i16>) in the transformed IR did not exist in the original IR.

Do not you think that we should perform these transformations and reduce the expression type width?

Taking your first example and increasing the element count to get legal types

define i16 @foo(<8 x i32> %X) {
  %A1 = zext <8 x i32> %X to <8 x i64>
  %B1 = mul <8 x i64> %A1, %A1
  %C1 = extractelement <8 x i64> %B1, i32 0
  %D1 = extractelement <8 x i64> %B1, i32 1
  %E1 = add i64 %C1, %D1
  %T = trunc i64 %E1 to i16
  ret i16 %T
}

define i16 @bar(<8 x i32> %X) {
  %A2 = trunc <8 x i32> %X to <8 x i16>
  %B2 = mul <8 x i16> %A2, %A2
  %C2 = extractelement <8 x i16> %B2, i32 0
  %D2 = extractelement <8 x i16> %B2, i32 1
  %T = add i16 %C2, %D2
  ret i16 %T
}

Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.

In D38313#959647, @craig.topper wrote:
Taking your first example and increasing the element count to get legal types
define i16 @foo(<8 x i32> %X) {
  %A1 = zext <8 x i32> %X to <8 x i64>
  %B1 = mul <8 x i64> %A1, %A1
  %C1 = extractelement <8 x i64> %B1, i32 0
  %D1 = extractelement <8 x i64> %B1, i32 1
  %E1 = add i64 %C1, %D1
  %T = trunc i64 %E1 to i16
  ret i16 %T
}

define i16 @bar(<8 x i32> %X) {
  %A2 = trunc <8 x i32> %X to <8 x i16>
  %B2 = mul <8 x i16> %A2, %A2
  %C2 = extractelement <8 x i16> %B2, i32 0
  %D2 = extractelement <8 x i16> %B2, i32 1
  %T = add i16 %C2, %D2
  ret i16 %T
}
Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.

I can "fix" that by ignoring cases where zext/sext will turn into a truncate for vector types, the check need to be done is:
"For each zext/sext instruction with vector type that have one usage, its source type size in bitwidth should be not less than the chosen MinBitWidth".

This will prevent creating the truncate, which was not in the IR before, on new vector types (or any vector type).
However, we will still have zext/sext to new vector type that was not in the IR before.

Does that solve the problem?

P.S. if you still insist on prevent this pass from creating new vector types, the solution is:

Do not support extractelement/insertelement.
Do not accept expressions with vector type truncate instruction, where the MinBitWidth > TruncBitWidth.

@spatel, what do you think about vector types here?

In D38313#960502, @craig.topper wrote:

@spatel, what do you think about vector types here?

I’m not at a dev machine, so I can’t try any experiments. But we’ve had something like this question come up in one of my vector cmp + select patches. Ideally, we’d always shrink those as we do with scalars, but as noted, we may not have good backend support to undo the transform. Given that it’s not a clear win, I think it’s best to limit the vector transforms in this initial patch. Then, we can enable those in a follow-up patch if there are known wins and deal with any regressions without risking the main (?) scalar motivating cases.

fedor.sergeev added a subscriber: fedor.sergeev.Dec 20 2017, 1:25 PM

Thanks for Zvi for helping me progress with this review while I am on vacation.
I will continue as an author from here.

Addressed Craig and Sanjay comments:

Retrieve the support for vector types.
Make sure that this transformation will not create a new vector type. This is achieved by allowing reducing expression with vector type only when MinBitWidth == TruncBitWidth.

zvi added inline comments.Dec 24 2017, 6:08 AM

test/Transforms/AggressiveInstCombine/trunc_multi_uses.ll
2	Should there be negative tests for the vector cases that are not permitted to transform?

aaboud added inline comments.Dec 26 2017, 12:24 AM

test/Transforms/AggressiveInstCombine/trunc_multi_uses.ll
2	There should be. However, such tests needs instructions such as lshr, ashr, udiv or urem, i.e., instructions that increase the MinBitWidth that we can truncate the expression to. So, I will add such test in the following patches, once I add support to these instructions.

@mzolotukhin may want to comment on this one before it goes in as he's spending large part of his time doing compile time work. Please wait for his opinion.

Hi,
I uploaded a new version about a week ago with the required change for not generating new vector type.
Please, let me know if you have any other comments.

Thanks,
Amjad

a.elovikov added a subscriber: a.elovikov.Jan 6 2018, 7:41 AM

I think the patch looks good now with the vector fix. Did you hear anything from @mzolotukhin about compile time?

Hi and sorry for the late reply, I've just returned from the holidays break.
The numbers posted before look good. I wonder though if it would make sense to only run this pass on -O3. I assume that even if now the pass spends very little time, it will grow in future and the compile-time costs might become noticeable.

Michael

In D38313#971848, @mzolotukhin wrote:

Hi and sorry for the late reply, I've just returned from the holidays break.
The numbers posted before look good. I wonder though if it would make sense to only run this pass on -O3. I assume that even if now the pass spends very little time, it will grow in future and the compile-time costs might become noticeable.

Michael

Thanks Michael for the feedback.
As you said, the pass spend very little time, cannot we decide on moving it to -O3 in the future when/if other heavy optimizations is added to this pass?
And even then, we can decide to run part of them on -O2 and the rest on -O3.

Would that work for you?

Thanks,
Amjad

As you said, the pass spend very little time, cannot we decide on moving it to -O3 in the future when/if other heavy optimizations is added to this pass?
And even then, we can decide to run part of them on -O2 and the rest on -O3.

My main concern with that is that it's actually really hard to demote something to lower optlevels retroactively.

For instance, right now it would make sense to move some parts of existing InstCombine out of O0. In practice it's a very time consuming task to do so (to find the right pieces, to do all the measurements, to agree in the community on the acceptable regressions etc.). And there is usually no single big heavy part that we can just move out and solve all the compile time issues - we have many small pieces that just sum up to something big.

I expect that to be the case with this pass as well - people will add more stuff, but each individual piece would contribute only a little, so it'll be hard to say "yep, this one goes to -O3, others can stay at O0/O2". So I think it's worth moving the whole pass to -O3 now rather than in the future (and how bad is it for other optlevels not to have it? is it really critical?).

Michael

Moved the aggressive-inst-combine pass to run with -O3.
I prefer to make this change now, in order to get approval to commit the pass, and in the future, once the pass is complete, to argue enabling it with -O2, in a separate discussion.

Please, let me know if you have any more concerns regarding this patch.

Thanks.

Herald added a subscriber: hintonda. · View Herald TranscriptJan 23 2018, 6:10 AM

No concerns from my side, thanks for making the change!

Michael

If there is no more concerns, can I get approval?
@craig.topper

LGTM

This revision is now accepted and ready to land.Jan 24 2018, 2:18 AM

Closed by commit rL323321: [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive… (authored by aaboud). · Explain WhyJan 24 2018, 4:44 AM

This revision was automatically updated to reflect the committed changes.

spatel mentioned this in D46760: [InstCombine] Enhance narrowUDivURem..May 20 2018, 10:56 AM

lebedev.ri mentioned this in D47113: [CVP] Teach CorrelatedValuePropagation to reduce the width of lshr instruction..May 26 2018, 4:38 AM

anton-afanasyev mentioned this in D113179: [Passes] Move AggressiveInstCombine after InstCombine.Nov 4 2021, 6:23 AM

Revision Contents

Path

Size

docs/

Passes.rst

15 lines

include/

llvm/

InitializePasses.h

1 line

Transforms/

Scalar.h

7 lines

lib/

LTO/

LLVMBuild.txt

1 line

Passes/

LLVMBuild.txt

2 lines

PassBuilder.cpp

3 lines

PassRegistry.def

1 line

Transforms/

AggressiveInstCombine/

AggressiveInstCombine.cpp

110 lines

AggressiveInstCombineInternal.h

119 lines

CMakeLists.txt

11 lines

TruncInstCombine.cpp

386 lines

	AggressiveInstCombine/
	Scalar/

LLVMBuild.txt

7 lines

CMakeLists.txt

1 line

IPO/

LLVMBuild.txt

2 lines

PassManagerBuilder.cpp

2 lines

LLVMBuild.txt

2 lines

Scalar/

LLVMBuild.txt

2 lines

test/

Other/

new-pm-defaults.ll

1 line

new-pm-lto-defaults.ll

1 line

new-pm-thinlto-defaults.ll

1 line

Transforms/

AggressiveInstCombine/

trunc_multi_uses.ll

113 lines

tools/

opt/

opt.cpp

1 line

Diff 123510

docs/Passes.rst

Show All 10 Lines	while (<HTML>) {
$order{$1} = sprintf("%03d", 1 + int %order);		$order{$1} = sprintf("%03d", 1 + int %order);
}		}
open HELP, "../Release/bin/opt -help\|" or die "open: opt -help: $!\n";		open HELP, "../Release/bin/opt -help\|" or die "open: opt -help: $!\n";
while (<HELP>) {		while (<HELP>) {
m:^ -([^ ]+) +- (.*)$: or next;		m:^ -([^ ]+) +- (.*)$: or next;
my $o = $order{$1};		my $o = $order{$1};
$o = "000" unless defined $o;		$o = "000" unless defined $o;
push @x, "$o<tr><td><a href=\"#$1\">-$1</a></td><td>$2</td></tr>\n";		push @x, "$o<tr><td><a href=\"#$1\">-$1</a></td><td>$2</td></tr>\n";
push @y, "$o <a name=\"$1\">-$1: $2</a>\n";		push @y, "$o <a name=\"$1\">-$1: $2</a>\n";
		zviUnsubmitted Done Reply Inline Actions For the sake of consistency with other title, please add more dashes for alignment with title. zvi: For the sake of consistency with other title, please add more dashes for alignment with title.
}		}
@x = map { s/^\d\d\d//; $_ } sort @x;		@x = map { s/^\d\d\d//; $_ } sort @x;
		zviUnsubmitted Done Reply Inline Actions patterns expressions zvi: patterns expressions
@y = map { s/^\d\d\d//; $_ } sort @y;		@y = map { s/^\d\d\d//; $_ } sort @y;
print @x, @y;		print @x, @y;
EOT		EOT
		zviUnsubmitted Done Reply Inline Actions reduces the width of expressions zvi:* *reduces the width of expressions

This (real) one-liner can also be helpful when converting comments to HTML:		This (real) one-liner can also be helpful when converting comments to HTML:

		zviUnsubmitted Not Done Reply Inline Actions Maybe also say that the pattern-scan may cover the entire functions as opposed to the locality-limited instcombine patterns? zvi: Maybe also say that the pattern-scan may cover the entire functions as opposed to the locality…
perl -e '$/ = undef; for (split(/\n/, <>)) { s:^ ///? ?::; print " <p>\n" if !$on && $_ =~ /\S/; print " </p>\n" if $on && $_ =~ /^\s$/; print " $_\n"; $on = ($_ =~ /\S/); } print " </p>\n" if $on'		perl -e '$/ = undef; for (split(/\n/, <>)) { s:^ ///? ?::; print " <p>\n" if !$on && $_ =~ /\S/; print " </p>\n" if $on && $_ =~ /^\s$/; print " $_\n"; $on = ($_ =~ /\S/); } print " </p>\n" if $on'

====================================		====================================
LLVM's Analysis and Transform Passes		LLVM's Analysis and Transform Passes
====================================		====================================

.. contents::		.. contents::
:local:		:local:
▲ Show 20 Lines • Show All 600 Lines • ▼ Show 20 Lines

This pass can also simplify calls to specific well-known function calls (e.g.		This pass can also simplify calls to specific well-known function calls (e.g.
runtime library functions). For example, a call ``exit(3)`` that occurs within		runtime library functions). For example, a call ``exit(3)`` that occurs within
the ``main()`` function can be transformed into simply ``return 3``. Whether or		the ``main()`` function can be transformed into simply ``return 3``. Whether or
not library calls are simplified is controlled by the		not library calls are simplified is controlled by the
:ref:`-functionattrs <passes-functionattrs>` pass and LLVM's knowledge of		:ref:`-functionattrs <passes-functionattrs>` pass and LLVM's knowledge of
library calls on different targets.		library calls on different targets.

		.. _passes-aggressive-instcombine:

		``-aggressive-instcombine``: Combine expression patterns
		--------------------------------------------------------

		Combine expression patterns to form expressions with fewer, simple instructions.
		This pass does not modify the CFG.

		For example, this pass reduce width of expressions post-dominated by TruncInst
		into smaller width when applicable.

		It differs from instcombine pass in that it contains pattern optimization that
		requires higher complexity than the O(1), thus, it should run fewer times than
		instcombine pass.

``-internalize``: Internalize Global Symbols		``-internalize``: Internalize Global Symbols
--------------------------------------------		--------------------------------------------

This pass loops over all of the functions in the input module, looking for a		This pass loops over all of the functions in the input module, looking for a
main function. If a main function is found, all other functions and all global		main function. If a main function is found, all other functions and all global
variables with initializers are marked as internal.		variables with initializers are marked as internal.

``-ipconstprop``: Interprocedural constant propagation		``-ipconstprop``: Interprocedural constant propagation
▲ Show 20 Lines • Show All 528 Lines • Show Last 20 Lines

include/llvm/InitializePasses.h

	//===- llvm/InitializePasses.h - Initialize All Passes ----------- C++ --===//			//===- llvm/InitializePasses.h - Initialize All Passes ----------- C++ --===//
	//			//
				zviUnsubmitted Done Reply Inline Actions These declarations are not perfectly sorted, but still maybe try to move this up. zvi: These declarations are not perfectly sorted, but still maybe try to move this up.
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file contains the declarations for the pass initialization routines			// This file contains the declarations for the pass initialization routines
	▲ Show 20 Lines • Show All 48 Lines • ▼ Show 20 Lines
	void initializeTarget(PassRegistry&);			void initializeTarget(PassRegistry&);

	void initializeAAEvalLegacyPassPass(PassRegistry&);			void initializeAAEvalLegacyPassPass(PassRegistry&);
	void initializeAAResultsWrapperPassPass(PassRegistry&);			void initializeAAResultsWrapperPassPass(PassRegistry&);
	void initializeADCELegacyPassPass(PassRegistry&);			void initializeADCELegacyPassPass(PassRegistry&);
	void initializeAddDiscriminatorsLegacyPassPass(PassRegistry&);			void initializeAddDiscriminatorsLegacyPassPass(PassRegistry&);
	void initializeAddressSanitizerModulePass(PassRegistry&);			void initializeAddressSanitizerModulePass(PassRegistry&);
	void initializeAddressSanitizerPass(PassRegistry&);			void initializeAddressSanitizerPass(PassRegistry&);
				void initializeAggressiveInstCombinerLegacyPassPass(PassRegistry&);
	void initializeAliasSetPrinterPass(PassRegistry&);			void initializeAliasSetPrinterPass(PassRegistry&);
	void initializeAlignmentFromAssumptionsPass(PassRegistry&);			void initializeAlignmentFromAssumptionsPass(PassRegistry&);
	void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);			void initializeAlwaysInlinerLegacyPassPass(PassRegistry&);
	void initializeArgPromotionPass(PassRegistry&);			void initializeArgPromotionPass(PassRegistry&);
	void initializeAssumptionCacheTrackerPass(PassRegistry&);			void initializeAssumptionCacheTrackerPass(PassRegistry&);
	void initializeAtomicExpandPass(PassRegistry&);			void initializeAtomicExpandPass(PassRegistry&);
	void initializeBDCELegacyPassPass(PassRegistry&);			void initializeBDCELegacyPassPass(PassRegistry&);
	void initializeBarrierNoopPass(PassRegistry&);			void initializeBarrierNoopPass(PassRegistry&);
	▲ Show 20 Lines • Show All 313 Lines • Show Last 20 Lines

include/llvm/Transforms/Scalar.h

	//===-- Scalar.h - Scalar Transformations ------------------------ C++ --===//			//===-- Scalar.h - Scalar Transformations ------------------------ C++ --===//
	//			//
	// The LLVM Compiler Infrastructure			// The LLVM Compiler Infrastructure
	//			//
	// This file is distributed under the University of Illinois Open Source			// This file is distributed under the University of Illinois Open Source
	// License. See LICENSE.TXT for details.			// License. See LICENSE.TXT for details.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
				zviUnsubmitted Done Reply Inline Actions expressions patterns zvi: expressions patterns
	//			//
	// This header file defines prototypes for accessor functions that expose passes			// This header file defines prototypes for accessor functions that expose passes
	// in the Scalar transformations library.			// in the Scalar transformations library.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_TRANSFORMS_SCALAR_H			#ifndef LLVM_TRANSFORMS_SCALAR_H
	#define LLVM_TRANSFORMS_SCALAR_H			#define LLVM_TRANSFORMS_SCALAR_H
	▲ Show 20 Lines • Show All 120 Lines • ▼ Show 20 Lines
	// %Z = add int 1, %Y			// %Z = add int 1, %Y
	// into:			// into:
	// %Z = add int 2, %X			// %Z = add int 2, %X
	//			//
	FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);			FunctionPass *createInstructionCombiningPass(bool ExpensiveCombines = true);

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
				// AggressiveInstCombiner - Combine expression patterns to form expressions with
				// fewer, simple instructions. This pass does not modify the CFG.
				//
				FunctionPass *createAggressiveInstCombinerPass();

				//===----------------------------------------------------------------------===//
				//
	// LICM - This pass is a loop invariant code motion and memory promotion pass.			// LICM - This pass is a loop invariant code motion and memory promotion pass.
	//			//
	Pass *createLICMPass();			Pass *createLICMPass();

	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// LoopSink - This pass sinks invariants from preheader to loop body where			// LoopSink - This pass sinks invariants from preheader to loop body where
	// frequency is lower than loop preheader.			// frequency is lower than loop preheader.
	▲ Show 20 Lines • Show All 445 Lines • Show Last 20 Lines

lib/LTO/LLVMBuild.txt

	;===- ./lib/LTO/LLVMBuild.txt ----------------------------------- Conf ---===;			;===- ./lib/LTO/LLVMBuild.txt ----------------------------------- Conf ---===;
	;			;
				zviUnsubmitted Done Reply Inline Actions Please preserve sorted ordering zvi: Please preserve sorted ordering
	; The LLVM Compiler Infrastructure			; The LLVM Compiler Infrastructure
	;			;
	; This file is distributed under the University of Illinois Open Source			; This file is distributed under the University of Illinois Open Source
	; License. See LICENSE.TXT for details.			; License. See LICENSE.TXT for details.
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;
	;			;
	; This is an LLVMBuild description file for the components in this subdirectory.			; This is an LLVMBuild description file for the components in this subdirectory.
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = LTO			name = LTO
	parent = Libraries			parent = Libraries
	required_libraries =			required_libraries =
				AggressiveInstCombine
	Analysis			Analysis
	BitReader			BitReader
	BitWriter			BitWriter
	CodeGen			CodeGen
	Core			Core
	IPO			IPO
	InstCombine			InstCombine
	Linker			Linker
	MC			MC
	ObjCARC			ObjCARC
	Object			Object
	Passes			Passes
	Scalar			Scalar
	Support			Support
	Target			Target
	TransformUtils			TransformUtils

lib/Passes/LLVMBuild.txt

	;===- ./lib/Passes/LLVMBuild.txt -------------------------------- Conf ---===;			;===- ./lib/Passes/LLVMBuild.txt -------------------------------- Conf ---===;
	;			;
	; The LLVM Compiler Infrastructure			; The LLVM Compiler Infrastructure
	;			;
	; This file is distributed under the University of Illinois Open Source			; This file is distributed under the University of Illinois Open Source
	; License. See LICENSE.TXT for details.			; License. See LICENSE.TXT for details.
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;
	;			;
	; This is an LLVMBuild description file for the components in this subdirectory.			; This is an LLVMBuild description file for the components in this subdirectory.
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = Passes			name = Passes
	parent = Libraries			parent = Libraries
	required_libraries = Analysis CodeGen Core IPO InstCombine Scalar Support TransformUtils Vectorize Instrumentation			required_libraries = AggressiveInstCombine Analysis CodeGen Core IPO InstCombine Scalar Support TransformUtils Vectorize Instrumentation
				zviUnsubmitted Done Reply Inline Actions Please preserve sorted ordering zvi: Please preserve sorted ordering

lib/Passes/PassBuilder.cpp

Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
#include "llvm/CodeGen/UnreachableBlockElim.h"		#include "llvm/CodeGen/UnreachableBlockElim.h"
#include "llvm/IR/Dominators.h"		#include "llvm/IR/Dominators.h"
#include "llvm/IR/IRPrintingPasses.h"		#include "llvm/IR/IRPrintingPasses.h"
#include "llvm/IR/PassManager.h"		#include "llvm/IR/PassManager.h"
#include "llvm/IR/Verifier.h"		#include "llvm/IR/Verifier.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/Regex.h"		#include "llvm/Support/Regex.h"
#include "llvm/Target/TargetMachine.h"		#include "llvm/Target/TargetMachine.h"
		#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
#include "llvm/Transforms/GCOVProfiler.h"		#include "llvm/Transforms/GCOVProfiler.h"
#include "llvm/Transforms/IPO/AlwaysInliner.h"		#include "llvm/Transforms/IPO/AlwaysInliner.h"
#include "llvm/Transforms/IPO/ArgumentPromotion.h"		#include "llvm/Transforms/IPO/ArgumentPromotion.h"
#include "llvm/Transforms/IPO/CalledValuePropagation.h"		#include "llvm/Transforms/IPO/CalledValuePropagation.h"
#include "llvm/Transforms/IPO/ConstantMerge.h"		#include "llvm/Transforms/IPO/ConstantMerge.h"
#include "llvm/Transforms/IPO/CrossDSOCFI.h"		#include "llvm/Transforms/IPO/CrossDSOCFI.h"
#include "llvm/Transforms/IPO/DeadArgumentElimination.h"		#include "llvm/Transforms/IPO/DeadArgumentElimination.h"
#include "llvm/Transforms/IPO/ElimAvailExtern.h"		#include "llvm/Transforms/IPO/ElimAvailExtern.h"
▲ Show 20 Lines • Show All 283 Lines • ▼ Show 20 Lines	PassBuilder::buildFunctionSimplificationPipeline(OptimizationLevel Level,

// Speculative execution if the target has divergent branches; otherwise nop.		// Speculative execution if the target has divergent branches; otherwise nop.
FPM.addPass(SpeculativeExecutionPass());		FPM.addPass(SpeculativeExecutionPass());

// Optimize based on known information about branches, and cleanup afterward.		// Optimize based on known information about branches, and cleanup afterward.
FPM.addPass(JumpThreadingPass());		FPM.addPass(JumpThreadingPass());
FPM.addPass(CorrelatedValuePropagationPass());		FPM.addPass(CorrelatedValuePropagationPass());
FPM.addPass(SimplifyCFGPass());		FPM.addPass(SimplifyCFGPass());
		FPM.addPass(AggressiveInstCombinePass());
FPM.addPass(InstCombinePass());		FPM.addPass(InstCombinePass());

if (!isOptimizingForSize(Level))		if (!isOptimizingForSize(Level))
FPM.addPass(LibCallsShrinkWrapPass());		FPM.addPass(LibCallsShrinkWrapPass());

invokePeepholeEPCallbacks(FPM, Level);		invokePeepholeEPCallbacks(FPM, Level);

// For PGO use pipeline, try to optimize memory intrinsics such as memcpy		// For PGO use pipeline, try to optimize memory intrinsics such as memcpy
▲ Show 20 Lines • Show All 609 Lines • ▼ Show 20 Lines	ModulePassManager PassBuilder::buildLTODefaultPipeline(OptimizationLevel Level,
// Remove unused arguments from functions.		// Remove unused arguments from functions.
MPM.addPass(DeadArgumentEliminationPass());		MPM.addPass(DeadArgumentEliminationPass());

// Reduce the code after globalopt and ipsccp. Both can open up significant		// Reduce the code after globalopt and ipsccp. Both can open up significant
// simplification opportunities, and both can propagate functions through		// simplification opportunities, and both can propagate functions through
// function pointers. When this happens, we often have to resolve varargs		// function pointers. When this happens, we often have to resolve varargs
// calls, etc, so let instcombine do this.		// calls, etc, so let instcombine do this.
FunctionPassManager PeepholeFPM(DebugLogging);		FunctionPassManager PeepholeFPM(DebugLogging);
		PeepholeFPM.addPass(AggressiveInstCombinePass());
PeepholeFPM.addPass(InstCombinePass());		PeepholeFPM.addPass(InstCombinePass());
invokePeepholeEPCallbacks(PeepholeFPM, Level);		invokePeepholeEPCallbacks(PeepholeFPM, Level);

MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM)));		MPM.addPass(createModuleToFunctionPassAdaptor(std::move(PeepholeFPM)));

// Note: historically, the PruneEH pass was run first to deduce nounwind and		// Note: historically, the PruneEH pass was run first to deduce nounwind and
// generally clean up exception handling overhead. It isn't clear this is		// generally clean up exception handling overhead. It isn't clear this is
// valuable as the inliner doesn't currently care whether it is inlining an		// valuable as the inliner doesn't currently care whether it is inlining an
▲ Show 20 Lines • Show All 819 Lines • Show Last 20 Lines

lib/Passes/PassRegistry.def

	Show First 20 Lines • Show All 131 Lines • ▼ Show 20 Lines
	#undef FUNCTION_ANALYSIS			#undef FUNCTION_ANALYSIS

	#ifndef FUNCTION_PASS			#ifndef FUNCTION_PASS
	#define FUNCTION_PASS(NAME, CREATE_PASS)			#define FUNCTION_PASS(NAME, CREATE_PASS)
	#endif			#endif
	FUNCTION_PASS("aa-eval", AAEvaluator())			FUNCTION_PASS("aa-eval", AAEvaluator())
	FUNCTION_PASS("adce", ADCEPass())			FUNCTION_PASS("adce", ADCEPass())
	FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())			FUNCTION_PASS("add-discriminators", AddDiscriminatorsPass())
				FUNCTION_PASS("aggressive-instcombine", AggressiveInstCombinePass())
	FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())			FUNCTION_PASS("alignment-from-assumptions", AlignmentFromAssumptionsPass())
	FUNCTION_PASS("bdce", BDCEPass())			FUNCTION_PASS("bdce", BDCEPass())
	FUNCTION_PASS("bounds-checking", BoundsCheckingPass())			FUNCTION_PASS("bounds-checking", BoundsCheckingPass())
	FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())			FUNCTION_PASS("break-crit-edges", BreakCriticalEdgesPass())
	FUNCTION_PASS("callsite-splitting", CallSiteSplittingPass())			FUNCTION_PASS("callsite-splitting", CallSiteSplittingPass())
	FUNCTION_PASS("consthoist", ConstantHoistingPass())			FUNCTION_PASS("consthoist", ConstantHoistingPass())
	FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())			FUNCTION_PASS("correlated-propagation", CorrelatedValuePropagationPass())
	FUNCTION_PASS("dce", DCEPass())			FUNCTION_PASS("dce", DCEPass())
	▲ Show 20 Lines • Show All 95 Lines • Show Last 20 Lines

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

This file was added.

				//===- AggressiveInstCombine.cpp ------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the aggressive expression pattern combiner classes.
				// Currently, it handles expression patterns for:
				// * Truncate instruction
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/Transforms/AggressiveInstCombine/AggressiveInstCombine.h"
				#include "AggressiveInstCombineInternal.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/BasicAliasAnalysis.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/Pass.h"
				#include "llvm/Transforms/Scalar.h"
				using namespace llvm;

				#define DEBUG_TYPE "aggressive-instcombine"

				namespace {
				/// Contains expression pattern combiner logic.
				/// This class provides both the logic to combine expression patterns and
				/// combine them. It differs from InstCombiner class in that each pattern
				zviUnsubmitted Done Reply Inline Actions Can you please rephrase this so that it is obvious that the pass can be run more than once and that the internal mechanism limits pattern matchers to run once? Maybe something like this It differs than the Instcombine in that each pattern combiner is run only once as opposed to instcombine's multi-iteration ... zvi: Can you please rephrase this so that it is obvious that the pass can be run more than once and…
				/// combiner runs only once as opposed to InstCombine's multi-iteration,
				/// which allows pattern combiner to have higher complexity than the O(1)
				/// required by the instruction combiner.
				class AggressiveInstCombinerLegacyPass : public FunctionPass {
				public:
				static char ID; // Pass identification, replacement for typeid

				AggressiveInstCombinerLegacyPass() : FunctionPass(ID) {
				initializeAggressiveInstCombinerLegacyPassPass(
				*PassRegistry::getPassRegistry());
				}

				void getAnalysisUsage(AnalysisUsage &AU) const override;

				/// Run all expression pattern optimizations on the given /p F function.
				///
				/// \param F function to optimize.
				/// \returns true if the IR is changed.
				bool runOnFunction(Function &F) override;
				};
				} // namespace

				void AggressiveInstCombinerLegacyPass::getAnalysisUsage(
				zviUnsubmitted Done Reply Inline Actions Is there an importance of order to the calls? If not, maybe group the set/addPreserve* calls and addRequired call to two group separated by a newline. IMHO it's more readable. zvi: Is there an importance of order to the calls? If not, maybe group the set/addPreserve* calls…
				aaboudAuthorUnsubmitted Done Reply Inline Actions Not sure if there is any importance to the order. I just did what InstructionCombiningPass does! aaboud: Not sure if there is any importance to the order. I just did what InstructionCombiningPass does!
				AnalysisUsage &AU) const {
				AU.setPreservesCFG();
				AU.addRequired<TargetLibraryInfoWrapperPass>();
				AU.addPreserved<AAResultsWrapperPass>();
				AU.addPreserved<BasicAAWrapperPass>();
				AU.addPreserved<GlobalsAAWrapperPass>();
				}

				craig.topperUnsubmitted Done Reply Inline Actions What about the new pass manager? craig.topper: What about the new pass manager?
				bool AggressiveInstCombinerLegacyPass::runOnFunction(Function &F) {
				auto &TLI = getAnalysis<TargetLibraryInfoWrapperPass>().getTLI();
				auto &DL = F.getParent()->getDataLayout();

				bool MadeIRChange = false;

				// Handle TruncInst patterns
				TruncInstCombine TIC(TLI, DL);
				MadeIRChange \|= TIC.run(F);

				// TODO: add more patterns to handle...

				return MadeIRChange;
				}

				PreservedAnalyses AggressiveInstCombinePass::run(Function &F,
				FunctionAnalysisManager &AM) {
				auto &TLI = AM.getResult<TargetLibraryAnalysis>(F);
				auto &DL = F.getParent()->getDataLayout();
				bool MadeIRChange = false;

				// Handle TruncInst patterns
				TruncInstCombine TIC(TLI, DL);
				MadeIRChange \|= TIC.run(F);
				if (!MadeIRChange)
				// No changes, all analyses are preserved.
				return PreservedAnalyses::all();

				// Mark all the analyses that instcombine updates as preserved.
				PreservedAnalyses PA;
				PA.preserveSet<CFGAnalyses>();
				PA.preserve<AAManager>();
				PA.preserve<GlobalsAA>();
				return PA;
				}

				char AggressiveInstCombinerLegacyPass::ID = 0;
				INITIALIZE_PASS_BEGIN(AggressiveInstCombinerLegacyPass,
				"aggressive-instcombine",
				"Combine pattern based expressions", false, false)
				INITIALIZE_PASS_DEPENDENCY(TargetLibraryInfoWrapperPass)
				INITIALIZE_PASS_END(AggressiveInstCombinerLegacyPass, "aggressive-instcombine",
				"Combine pattern based expressions", false, false)

				FunctionPass *llvm::createAggressiveInstCombinerPass() {
				return new AggressiveInstCombinerLegacyPass();
				}

lib/Transforms/AggressiveInstCombine/AggressiveInstCombineInternal.h

This file was added.

				//===- AggressiveInstCombineInternal.h --------------------------- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This file implements the instruction pattern combiner classes.
				// Currently, it handles pattern expressions for:
				// * Truncate instruction
				//
				//===----------------------------------------------------------------------===//

				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/Analysis/AliasAnalysis.h"
				#include "llvm/Analysis/BasicAliasAnalysis.h"
				#include "llvm/Analysis/ConstantFolding.h"
				#include "llvm/Analysis/GlobalsModRef.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				zviUnsubmitted Done Reply Inline Actions Move two lines up zvi: Move two lines up
				#include "llvm/IR/DataLayout.h"
				#include "llvm/Pass.h"
				#include "llvm/Transforms/Scalar.h"
				using namespace llvm;

				//===----------------------------------------------------------------------===//
				// TruncInstCombine - looks for expression dags dominated by trunc instructions
				// and for each eligible dag, it will create a reduced bit-width expression and
				// replace the old expression with this new one and remove the old one.
				// Eligible expression dag is such that:
				// 1. Contains only supported instructions.
				// 2. Supported leaves: ZExtInst, SExtInst, TruncInst and Constant value.
				// 3. Can be evaluated into type with reduced legal bit-width (or Trunc type).
				// 4. All instructions in the dag must not have users outside the dag.
				// Only exception is for {ZExt, SExt}Inst with operand type equal to the
				// new reduced type chosen in (3).
				//
				// The motivation for this optimization is that evaluating and expression using
				// smaller bit-width is preferable, especially for vectorization where we can
				// fit more values in one vectorized instruction. In addition, this optimization
				// may decrease the number of cast instructions, but will not increase it.
				//===----------------------------------------------------------------------===//

				namespace llvm {
				class DataLayout;
				class TargetLibraryInfo;

				class TruncInstCombine {
				TargetLibraryInfo &TLI;
				const DataLayout &DL;

				/// List of all TruncInst instructions to be processed.
				SmallVector<TruncInst *, 4> Worklist;

				/// Current processed TruncInst instruction.
				TruncInst *CurrentTruncInst;

				/// Information per each instruction in the expression dag.
				struct Info {
				/// Number of LSBs that are needed to generate a valid expression.
				unsigned ValidBitWidth = 0;
				/// Minimum number of LSBs needed to generate the ValidBitWidth.
				unsigned MinBitWidth = 0;
				/// The reduced value generated to replace the old instruction.
				Value *NewValue = nullptr;
				};
				/// An ordered map representing expression dag post-dominated by current
				/// processed TruncInst. It maps each instruction in the dag to its Info
				zviUnsubmitted Done Reply Inline Actions post-dominated zvi: post-dominated
				/// structure. The map is ordered such that each instruction appears before
				/// all other instructions in the dag that uses it.
				MapVector<Instruction *, Info> InstInfoMap;

				public:
				TruncInstCombine(TargetLibraryInfo &TLI, const DataLayout &DL)
				: TLI(TLI), DL(DL), CurrentTruncInst(nullptr) {}

				/// Perform TruncInst pattern optimization on given function.
				bool run(Function &F);

				private:
				/// Build expression dag dominated by the /p CurrentTruncInst and append it to
				/// the InstInfoMap container.
				///
				/// \return true only if succeed to generate an eligible sub expression dag.
				bool buildTruncExpressionDag();

				/// Calculate the minimal allowed bit-width of the chain ending with the
				/// currently visited truncate's operand.
				craig.topperUnsubmitted Done Reply Inline Actions This sentence reads funny craig.topper: This sentence reads funny
				///
				/// \return minimum number of bits to which the chain ending with the
				/// truncate's operand can be shrunk to.
				unsigned getMinBitWidth();

				/// Build an expression dag dominated by the current processed TruncInst and
				/// Check if it is eligible to be reduced to a smaller type.
				///
				/// \return the scalar version of the new type to be used for the reduced
				/// expression dag, or nullptr if the expression dag is not eligible
				/// to be reduced.
				Type *getBestTruncatedType();

				/// Given a \p V value and a \p SclTy scalar type return the generated reduced
				/// value of \p V based on the type \p SclTy.
				///
				/// \param V value to be reduced.
				/// \param SclTy scalar version of new type to reduce to.
				/// \return the new reduced value.
				Value getReducedOperand(Value V, Type *SclTy);

				/// Create a new expression dag using the reduced /p SclTy type and replace
				/// the old expression dag with it. Also erase all instructions in the old
				/// dag, except those that are still needed outside the dag.
				///
				/// \param SclTy scalar version of new type to reduce expression dag into.
				void ReduceExpressionDag(Type *SclTy);
				};
				} // end namespace llvm.

lib/Transforms/AggressiveInstCombine/CMakeLists.txt

This file was added.

				add_llvm_library(LLVMAggressiveInstCombine
				AggressiveInstCombine.cpp
				TruncInstCombine.cpp

				ADDITIONAL_HEADER_DIRS
				${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms
				${LLVM_MAIN_INCLUDE_DIR}/llvm/Transforms/AggressiveInstCombine

				DEPENDS
				intrinsics_gen
				)

lib/Transforms/AggressiveInstCombine/LLVMBuild.txt

This file was copied from lib/Transforms/Scalar/LLVMBuild.txt.

	;===- ./lib/Transforms/Scalar/LLVMBuild.txt --------------------- Conf ---===;			;===- ./lib/Transforms/AggressiveInstCombine/LLVMBuild.txt ------ Conf ---===;
	;			;
	; The LLVM Compiler Infrastructure			; The LLVM Compiler Infrastructure
	;			;
	; This file is distributed under the University of Illinois Open Source			; This file is distributed under the University of Illinois Open Source
	; License. See LICENSE.TXT for details.			; License. See LICENSE.TXT for details.
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;
	;			;
	; This is an LLVMBuild description file for the components in this subdirectory.			; This is an LLVMBuild description file for the components in this subdirectory.
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = Scalar			name = AggressiveInstCombine
	parent = Transforms			parent = Transforms
	library_name = ScalarOpts			required_libraries = Analysis Core Support TransformUtils
	required_libraries = Analysis Core InstCombine Support TransformUtils

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

This file was added.

				//===- TruncInstCombine.cpp -----------------------------------------------===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// TruncInstCombine - looks for expression dags post-dominated by TruncInst and
				// for each eligible dag, it will create a reduced bit-width expression, replace
				// the old expression with this new one and remove the old expression.
				// Eligible expression dag is such that:
				// 1. Contains only supported instructions.
				// 2. Supported leaves: ZExtInst, SExtInst, TruncInst and Constant value.
				// 3. Can be evaluated into type with reduced legal bit-width.
				// 4. All instructions in the dag must not have users outside the dag.
				// The only exception is for {ZExt, SExt}Inst with operand type equal to
				// the new reduced type evaluated in (3).
				//
				// The motivation for this optimization is that evaluating and expression using
				// smaller bit-width is preferable, especially for vectorization where we can
				// fit more values in one vectorized instruction. In addition, this optimization
				// may decrease the number of cast instructions, but will not increase it.
				//
				//===----------------------------------------------------------------------===//

				#include "AggressiveInstCombineInternal.h"
				#include "llvm/ADT/MapVector.h"
				#include "llvm/ADT/STLExtras.h"
				#include "llvm/Analysis/ConstantFolding.h"
				#include "llvm/Analysis/TargetLibraryInfo.h"
				#include "llvm/IR/DataLayout.h"
				#include "llvm/IR/IRBuilder.h"
				using namespace llvm;

				#define DEBUG_TYPE "aggressive-instcombine"

				/// Given an instruction and a container, it fills all the relevant operands of
				/// that instruction, with respect to the Trunc expression dag optimizaton.
				static void getRelevantOperands(Instruction I, SmallVectorImpl<Value > &Ops) {
				zviUnsubmitted Not Done Reply Inline Actions +1 for preserving DebugInfo. Is there other metadata that may need to be copied? I can't think of anything in particular, just wanted to raise the possibility. zvi: +1 for preserving DebugInfo. Is there other metadata that may need to be copied? I can't think…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions This is what InstCombine preserves. aaboud: This is what InstCombine preserves.
				unsigned Opc = I->getOpcode();
				switch (Opc) {
				case Instruction::Trunc:
				case Instruction::ZExt:
				zviUnsubmitted Not Done Reply Inline Actions Is it possible to reuse llvm::ReplaceInstWithInst instead of doing the low-level work explicitly? zvi: Is it possible to reuse llvm::ReplaceInstWithInst instead of doing the low-level work…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions llvm::ReplaceInstWithInst also deletes the old instructions, which we are not ready to delete at this point. aaboud: llvm::ReplaceInstWithInst also deletes the old instructions, which we are not ready to delete…
				case Instruction::SExt:
				// These CastInst are considered leaves of the evaluated expression, thus,
				// their operands are not relevent.
				break;
				case Instruction::Add:
				case Instruction::Sub:
				case Instruction::Mul:
				case Instruction::And:
				case Instruction::Or:
				case Instruction::Xor:
				zviUnsubmitted Done Reply Inline Actions IIRC, in one of the previous revisions of this patch there was a comment explaining why these cast instructions are skipped? Can you please revive it or add a new one? zvi: IIRC, in one of the previous revisions of this patch there was a comment explaining why these…
				Ops.push_back(I->getOperand(0));
				Ops.push_back(I->getOperand(1));
				break;
				default:
				llvm_unreachable("Unreachable!");
				}
				}

				bool TruncInstCombine::buildTruncExpressionDag() {
				SmallVector<Value *, 8> Worklist;
				zviUnsubmitted Not Done Reply Inline Actions Could we avoid pushing constants and maybe even generalize to avoid values that are not instructions? At least for these cases it may be ok, not for others, such as divide where we need to be more cautious zvi: Could we avoid pushing constants and maybe even generalize to avoid values that are not…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions I prefer not to complicate this function, it should return operands that are related to the optimization, the caller should check each relevant operand if it is a constant or instruction, there will be cases where even a constant need to be evaluated and will not skipped immediately. aaboud: I prefer not to complicate this function, it should return operands that are related to the…
				SmallVector<Instruction *, 8> Stack;
				// Clear old expression dag.
				InstInfoMap.clear();

				Worklist.push_back(CurrentTruncInst->getOperand(0));

				while (!Worklist.empty()) {
				Value *Curr = Worklist.back();

				if (isa<Constant>(Curr)) {
				Worklist.pop_back();
				continue;
				hfinkelUnsubmitted Done Reply Inline Actions You should use a worklist here, not recursion. Then they'll be no need for the depth limit (or it can be very large). hfinkel: You should use a worklist here, not recursion. Then they'll be no need for the depth limit (or…
				}

				auto *I = dyn_cast<Instruction>(Curr);
				if (!I)
				craig.topperUnsubmitted Done Reply Inline Actions Can you consistently use auto with dyn_cast throughout this patch. craig.topper: Can you consistently use auto with dyn_cast throughout this patch.
				return false;

				if (!Stack.empty() && Stack.back() == I) {
				// Already handled all instruction operands, can remove it from both the
				// Worklist and the Stack, and add it to the instruction info map.
				Worklist.pop_back();
				Stack.pop_back();
				// Insert I to the Info map.
				InstInfoMap.insert(std::make_pair(I, Info()));
				continue;
				}

				if (InstInfoMap.count(I)) {
				Worklist.pop_back();
				continue;
				}

				// Add the instruction to the stack before start handling its operands.
				Stack.push_back(I);

				unsigned Opc = I->getOpcode();
				switch (Opc) {
				case Instruction::Trunc:
				case Instruction::ZExt:
				case Instruction::SExt:
				// trunc(trunc(x)) -> trunc(x)
				// trunc(ext(x)) -> ext(x) if the source type is smaller than the new dest
				// trunc(ext(x)) -> trunc(x) if the source type is larger than the new
				// dest
				break;
				case Instruction::Add:
				case Instruction::Sub:
				case Instruction::Mul:
				hfinkelUnsubmitted Done Reply Inline Actions You can mention specific things here. Comments like "TODO: more work" are not helpful. You have a list in your patch description: select, shufflevector, extractelement, insertelement udiv, urem shl, lshr, ashr phi node (and loop handling) hfinkel: You can mention specific things here. Comments like "TODO: more work" are not helpful. You have…
				case Instruction::And:
				case Instruction::Or:
				case Instruction::Xor: {
				SmallVector<Value *, 2> Operands;
				getRelevantOperands(I, Operands);
				for (Value *Operand : Operands)
				Worklist.push_back(Operand);
				break;
				}
				default:
				// TODO: Can handle more cases here:
				// 1. select, shufflevector, extractelement, insertelement
				// 2. udiv, urem
				// 3. shl, lshr, ashr
				// 4. phi node(and loop handling)
				// ...
				return false;
				}
				}
				return true;
				}

				unsigned TruncInstCombine::getMinBitWidth() {
				SmallVector<Value *, 8> Worklist;
				SmallVector<Instruction *, 8> Stack;

				Value *Src = CurrentTruncInst->getOperand(0);
				Type *DstTy = CurrentTruncInst->getType();
				unsigned TruncBitWidth = DstTy->getScalarSizeInBits();
				unsigned OrigBitWidth =
				CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();

				if (isa<Constant>(Src))
				return TruncBitWidth;

				Worklist.push_back(Src);
				InstInfoMap[cast<Instruction>(Src)].ValidBitWidth = TruncBitWidth;

				while (!Worklist.empty()) {
				Value *Curr = Worklist.back();

				if (isa<Constant>(Curr)) {
				Worklist.pop_back();
				continue;
				hfinkelUnsubmitted Not Done Reply Inline Actions If we have a DAG of instructions with multiple trunc outputs, we'll end up walking the DAG once per trunc output? hfinkel: If we have a DAG of instructions with multiple trunc outputs, we'll end up walking the DAG once…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions That is is correct, and we will not be able to perform the transformation for any of these trunc instructions. This means that the complexity of this pass is at most O(n^2), in the worst case where n/2 is the DAG nodes and n/2 are trunc instructions that are using/truncating the DAG result. Should we worry about this case? Can we solve it? aaboud: That is is correct, and we will not be able to perform the transformation for any of these…
				}

				// Otherwise, it must be an instruction.
				auto *I = cast<Instruction>(Curr);

				auto &Info = InstInfoMap[I];

				SmallVector<Value *, 2> Operands;
				getRelevantOperands(I, Operands);

				if (!Stack.empty() && Stack.back() == I) {
				// Already handled all instruction operands, can remove it from both, the
				// Worklist and the Stack, and update MinBitWidth.
				Worklist.pop_back();
				Stack.pop_back();
				for (auto *Operand : Operands)
				if (auto *IOp = dyn_cast<Instruction>(Operand))
				Info.MinBitWidth =
				craig.topperUnsubmitted Done Reply Inline Actions LLVM style preferes "auto IOp". We don't want auto to hide the fact that its a pointer. Please scrub the whole patch for this. craig.topper:* LLVM style preferes "auto *IOp". We don't want auto to hide the fact that its a pointer. Please…
				std::max(Info.MinBitWidth, InstInfoMap[IOp].MinBitWidth);
				continue;
				}

				// Add the instruction to the stack before start handling its operands.
				Stack.push_back(I);
				hfinkelUnsubmitted Done Reply Inline Actions If your goal is only to produce legal integer widths (which might miss some cases for vectorization), then I think that you should just fold this information into getMinBitWidth so that getMinBitWidth returns the minimum legal bit width. hfinkel: If your goal is only to produce legal integer widths (which might miss some cases for…
				unsigned ValidBitWidth = Info.ValidBitWidth;

				// Update minimum bit-width before handling its operands. This is required
				// when the instruction is part of a loop.
				Info.MinBitWidth = std::max(Info.MinBitWidth, Info.ValidBitWidth);

				for (auto *Operand : Operands)
				if (auto *IOp = dyn_cast<Instruction>(Operand)) {
				// If we already calculated the minimum bit-width for this valid
				// bit-width, or for a smaller valid bit-width, then just keep the
				// answer we already calculated.
				unsigned IOpBitwidth = InstInfoMap.lookup(IOp).ValidBitWidth;
				if (IOpBitwidth >= ValidBitWidth)
				continue;
				InstInfoMap[IOp].ValidBitWidth = std::max(ValidBitWidth, IOpBitwidth);
				Worklist.push_back(IOp);
				}
				}
				unsigned MinBitWidth = InstInfoMap.lookup(cast<Instruction>(Src)).MinBitWidth;
				assert(MinBitWidth >= TruncBitWidth);

				zviUnsubmitted Done Reply Inline Actions At this point a mappting for IOp must exist in InstInfoMap, right? Then please use lookup() or find(). Also better avoid searching for same key more than once. zvi: At this point a mappting for IOp must exist in InstInfoMap, right? Then please use lookup() or…
				if (MinBitWidth > TruncBitWidth) {
				// Use the smallest integer type in the range [MinBitWidth, OrigBitWidth).
				Type *Ty = DL.getSmallestLegalIntType(DstTy->getContext(), MinBitWidth);
				craig.topperUnsubmitted Not Done Reply Inline Actions I'm not sure you've addressed my vector concerns. The first part of this 'if' would create a new type for vectors by using getSmallestLegalIntType. craig.topper: I'm not sure you've addressed my vector concerns. The first part of this 'if' would create a…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions I think that in the "else" part, the one that I kept the same behavior as the original instcombine code, we might end up creating a vector type that was not in the IR, as for vector types we do not check for type legality. So, why in this case we should behave differently? Regarding the scalar check, it might be redundant, but not always, because even if the "trunc" instruction is performed on vector type, the evaluated expression might contain scalar operations (due to the "insertelement" instruction, which will be supported in next few patches). Furthermore, my assumption is that codegen legalizer will promote the illegal vector type back to the original type (or to a smaller one), in both cases we will not get worse code than the one we started with! Is that assumption too optimistic? aaboud: I think that in the "else" part, the one that I kept the same behavior as the original…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions Just to emphasize I am adding two examples. Case 1 (MinBitWidth == TruncBitWidth): %A1 = zext <2 x i32> %X to <2 x i64> %B1 = mul <2 x i64> %A1, %A1 %C1 = extractelement <2 x i64> %B1, i32 0 %D1 = extractelement <2 x i64> %B1, i32 1 %E1 = add i64 %C1, %D1 %T = trunc i64 %E1 to i16 => %A2 = trunc <2 x i32> %X to <2 x i16> %B2 = mul <2 x i16> %X, %X %C2 = extractelement <2 x i16> %B2, i32 0 %D2 = extractelement <2 x i16> %B2, i32 1 %T = add i16 %C1, %D1 Case 2 (MinBitWidth > TruncBitWidth): %A1 = zext <2 x i32> %X to <2 x i64> %B1 = lshr <2 x i64> %A1, <i64 8, i64 8> %C1 = mul <2 x i64> %B1, %B1 %T = trunc <2 x i64> %C1 to <2 x i8> => %A2 = trunc <2 x i32> %X to <2 x i16> %B2 = lshr <2 x i16> %A2, <i32 8, i32 8> %C2 = mul <2 x i16> %B2, %B2 %T = trunc <2 x i16> %C2 to <2 x i8> Notice that in both cases the "new" vector type (<2 x i16>) in the transformed IR did not exist in the original IR. Do not you think that we should perform these transformations and reduce the expression type width? aaboud: Just to emphasize I am adding two examples. Case 1 (MinBitWidth == TruncBitWidth): ``` %A1 =…
				// Update minimum bit-width with the new destination type bit-width if
				craig.topperUnsubmitted Done Reply Inline Actions In the case of vectors, is this using legal scalar integer types to constrain the vector element type? I'm not sure if that's the right behavior. The legal scalar types don't necessarily imply anything about vector types. I think we generally try to avoid creating vector types that didn't appear in the IR. craig.topper: In the case of vectors, is this using legal scalar integer types to constrain the vector…
				zviUnsubmitted Done Reply Inline Actions That' s a good point. Will look into this. zvi: That' s a good point. Will look into this.
				// succeeded to find such, otherwise, with original bit-width.
				MinBitWidth = Ty ? Ty->getScalarSizeInBits() : OrigBitWidth;
				} else { // MinBitWidth == TruncBitWidth
				zviUnsubmitted Done Reply Inline Actions lookup() zvi: lookup()
				// In this case the expression can be evaluated with the trunc instruction
				craig.topperUnsubmitted Done Reply Inline Actions Isn't this MinBitWidth == TruncBitWidth? craig.topper: Isn't this MinBitWidth == TruncBitWidth?
				zviUnsubmitted Done Reply Inline Actions Your observation is correct but the comment is also correct and it explains something that may not be obvious. At this point, we completed visiting the expression tree starting from the truncate's operand and found 'MinBitWidth' by taking the max of all min-bit-width requirements of the predecessors. Now, since this is a truncate instruction, and by construction of the expression tree it follows that the computed MinBitWidth can never be less than the truncate's return types's size in bits or in other words MinBitWidth >= TruncBitWidth. The case of MinBitWidth > TruncBitWidth is hand;ed in the then block just above, and what remains in to handle the MinBitWidth == TruncBitWidth in the else block here. I can put this in a comment if you think it would be helpful. zvi: Your observation is correct but the comment is also correct and it explains something that may…
				craig.topperUnsubmitted Done Reply Inline Actions I was just saying that it should say TruncBitWidth not ValidBitWidth right? craig.topper: I was just saying that it should say TruncBitWidth not ValidBitWidth right?
				zviUnsubmitted Done Reply Inline Actions Ah, right :) zvi: Ah, right :)
				// destination type, and trunc instruction can be omitted. However, we
				// should not perform the evaluation if the original type is a legal scalar
				// type and the target type is illegal.
				bool FromLegal = MinBitWidth == 1 \|\| DL.isLegalInteger(OrigBitWidth);
				bool ToLegal = MinBitWidth == 1 \|\| DL.isLegalInteger(MinBitWidth);
				if (FromLegal && !ToLegal)
				return OrigBitWidth;
				craig.topperUnsubmitted Not Done Reply Inline Actions I think the !Vector check that was here was correct previously. We don't do isLegalnteger checks on the scalar types of vectors. For vectors we assume that if the type was present in the IR, the transform is fine. In this block, TruncWidth == MinBitWidth so the type existed in the original IR. My vector concerns were about the block above where we create new a new type. craig.topper: I think the !Vector check that was here was correct previously. We don't do isLegalnteger…
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions I agree with Craig, need to change back to: if (!DstTy->isVectorTy() && FromLegal && !ToLegal) aaboud: I agree with Craig, need to change back to: ``` if (!DstTy->isVectorTy() && FromLegal && !
				}
				return MinBitWidth;
				}

				Type *TruncInstCombine::getBestTruncatedType() {
				if (!buildTruncExpressionDag())
				return nullptr;

				// We don't want to duplicate instructions, which isn't profitable. Thus, we
				// can't shrink something that has multiple users, unless all users are
				zviUnsubmitted Done Reply Inline Actions IMO this would be more readable and guarantee that code changes won't lead to uses of stale values of MinBitWidth: MinBitWidth = OrigBitwidth; and drop the return zvi: IMO this would be more readable and guarantee that code changes won't lead to uses of stale…
				// post-dominated by the trunc instruction, i.e., were visited during the
				// expression evaluation.
				unsigned DesiredBitWidth = 0;
				for (auto Itr : InstInfoMap) {
				Instruction *I = Itr.first;
				if (I->hasOneUse())
				continue;
				bool IsExtInst = (isa<ZExtInst>(I) \|\| isa<SExtInst>(I));
				for (auto *U : I->users())
				if (auto *UI = dyn_cast<Instruction>(U))
				if (UI != CurrentTruncInst && !InstInfoMap.count(UI)) {
				if (!IsExtInst)
				return nullptr;
				// If this is an extension from the dest type, we can eliminate it,
				// even if it has multiple users. Thus, update the DesiredBitWidth and
				// validate all extension instructions agrees on same DesiredBitWidth.
				unsigned ExtInstBitWidth =
				I->getOperand(0)->getType()->getScalarSizeInBits();
				if (DesiredBitWidth && DesiredBitWidth != ExtInstBitWidth)
				return nullptr;
				DesiredBitWidth = ExtInstBitWidth;
				}
				}

				unsigned OrigBitWidth =
				CurrentTruncInst->getOperand(0)->getType()->getScalarSizeInBits();

				// Calculate minimum allowed bit-width allowed for shrinking the currently
				// visited truncate's operand.
				unsigned MinBitWidth = getMinBitWidth();

				// Check that we can shrink to smaller bit-width than original one and that
				// it is similar to the DesiredBitWidth is such exists.
				if (MinBitWidth >= OrigBitWidth \|\|
				(DesiredBitWidth && DesiredBitWidth != MinBitWidth))
				return nullptr;

				return IntegerType::get(CurrentTruncInst->getContext(), MinBitWidth);
				}

				Value TruncInstCombine::getReducedOperand(Value V, Type *Ty) {
				if (auto *C = dyn_cast<Constant>(V)) {
				C = ConstantExpr::getIntegerCast(C, Ty, false);
				// If we got a constantexpr back, try to simplify it with DL info.
				if (Constant *FoldedC = ConstantFoldConstant(C, DL, &TLI))
				C = FoldedC;
				return C;
				}

				auto *I = cast<Instruction>(V);
				Info Entry = InstInfoMap.lookup(I);
				assert(Entry.NewValue);
				return Entry.NewValue;
				}
				hfinkelUnsubmitted Done Reply Inline Actions Please write actual messages in llvm_unreachable. Perhaps something like: llvm_unreachable("Unhandled instruction"); hfinkel: Please write actual messages in llvm_unreachable. Perhaps something like: llvm_unreachable…

				void TruncInstCombine::ReduceExpressionDag(Type *Ty) {
				for (auto &Itr : InstInfoMap) { // Forward
				Instruction *I = Itr.first;

				assert(!InstInfoMap.lookup(I).NewValue && "Instruction has been evaluated");

				IRBuilder<> Builder(I);
				Value *Res = nullptr;
				unsigned Opc = I->getOpcode();
				switch (Opc) {
				case Instruction::Trunc:
				case Instruction::ZExt:
				case Instruction::SExt: {
				// If the source type of the cast is the type we're trying for then we can
				// just return the source. There's no need to insert it because it is not
				// new.
				if (I->getOperand(0)->getType() == Ty) {
				InstInfoMap[I].NewValue = I->getOperand(0);
				continue;
				}
				// Otherwise, must be the same type of cast, so just reinsert a new one.
				// This also handles the case of zext(trunc(x)) -> zext(x).
				Res = Builder.CreateIntCast(I->getOperand(0), Ty,
				Opc == Instruction::SExt);

				// Update Worklist entries with new value if needed.
				if (auto *NewCI = dyn_cast<TruncInst>(Res)) {
				auto Entry = find(Worklist, I);
				if (Entry != Worklist.end())
				*Entry = NewCI;
				}
				break;
				}
				case Instruction::Add:
				case Instruction::Sub:
				case Instruction::Mul:
				case Instruction::And:
				case Instruction::Or:
				case Instruction::Xor: {
				Value *LHS = getReducedOperand(I->getOperand(0), Ty);
				Value *RHS = getReducedOperand(I->getOperand(1), Ty);
				Res = Builder.CreateBinOp((Instruction::BinaryOps)Opc, LHS, RHS);
				break;
				}
				default:
				llvm_unreachable("Unhandled instruction");
				}

				InstInfoMap[I].NewValue = Res;
				cast<Instruction>(Res)->takeName(I);
				}

				Value *Res = getReducedOperand(CurrentTruncInst->getOperand(0), Ty);
				Type *DstTy = CurrentTruncInst->getType();
				if (Res->getType() != DstTy) {
				IRBuilder<> Builder(CurrentTruncInst);
				Res = Builder.CreateIntCast(Res, DstTy, false);
				cast<Instruction>(Res)->takeName(CurrentTruncInst);
				}
				CurrentTruncInst->replaceAllUsesWith(Res);

				// Erase old expression dag, which was replaced by the reduced expression dag.
				// We iterate backward, which means we visit the instruction before we visit
				// any of its operands, this way, when we get to the operand, we already
				// removed the instructions (from the expression dag) that uses it.
				CurrentTruncInst->eraseFromParent();
				for (auto I = InstInfoMap.rbegin(), E = InstInfoMap.rend(); I != E; ++I) {
				// We still need to check that the instruction has no users before we erase
				// it, because {SExt, ZExt}Inst Instruction might have other users that was
				// not reduced, in such case, we need to keep that instruction.
				zviUnsubmitted Done Reply Inline Actions I may have missed this, but why defer insertion of instruction to the BB to this point? And consider using the IRBuilder instead of calling ::Create* above and below. zvi: I may have missed this, but why defer insertion of instruction to the BB to this point? And…
				if (!I->first->getNumUses())
				I->first->eraseFromParent();
				}
				}

				bool TruncInstCombine::run(Function &F) {
				bool MadeIRChange = false;

				// Collect all TruncInst in the function into the Worklist for evaluating.
				// TODO: Reason about cases where vector element types can be shrunk.
				for (auto &BB : F)
				for (auto &I : BB)
				if (auto *CI = dyn_cast<TruncInst>(&I))
				if (!CI->getSrcTy()->isVectorTy())
				Worklist.push_back(CI);

				// Process all TruncInst in the Worklist, for each instruction:
				// 1. Check if it dominates an eligible expression dag to be reduced.
				// 2. Create a reduced expression dag and replace the old one with it.
				while (!Worklist.empty()) {
				CurrentTruncInst = Worklist.pop_back_val();

				if (Type *NewDstTy = getBestTruncatedType()) {
				DEBUG(dbgs() << "ICE: TruncInstCombine reducing type of expression dag "
				"dominated by: "
				<< CurrentTruncInst << '\n');
				ReduceExpressionDag(NewDstTy);
				MadeIRChange = true;
				}
				}

				return MadeIRChange;
				}

lib/Transforms/CMakeLists.txt

	add_subdirectory(Utils)			add_subdirectory(Utils)
	add_subdirectory(Instrumentation)			add_subdirectory(Instrumentation)
				add_subdirectory(AggressiveInstCombine)
	add_subdirectory(InstCombine)			add_subdirectory(InstCombine)
	add_subdirectory(Scalar)			add_subdirectory(Scalar)
	add_subdirectory(IPO)			add_subdirectory(IPO)
	add_subdirectory(Vectorize)			add_subdirectory(Vectorize)
	add_subdirectory(Hello)			add_subdirectory(Hello)
	add_subdirectory(ObjCARC)			add_subdirectory(ObjCARC)
	add_subdirectory(Coroutines)			add_subdirectory(Coroutines)

lib/Transforms/IPO/LLVMBuild.txt

	Show All 14 Lines
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = IPO			name = IPO
	parent = Transforms			parent = Transforms
	library_name = ipo			library_name = ipo
	required_libraries = Analysis BitReader BitWriter Core InstCombine IRReader Linker Object ProfileData Scalar Support TransformUtils Vectorize Instrumentation			required_libraries = AggressiveInstCombine Analysis BitReader BitWriter Core InstCombine IRReader Linker Object ProfileData Scalar Support TransformUtils Vectorize Instrumentation

lib/Transforms/IPO/PassManagerBuilder.cpp

//===- PassManagerBuilder.cpp - Build Standard Pass -----------------------===//		//===- PassManagerBuilder.cpp - Build Standard Pass -----------------------===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
		craig.topperUnsubmitted Done Reply Inline Actions What about the new pass builder for the new pass manager? craig.topper: What about the new pass builder for the new pass manager?
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines the PassManagerBuilder class, which is used to set up a		// This file defines the PassManagerBuilder class, which is used to set up a
// "standard" optimization sequence suitable for languages like C and C++.		// "standard" optimization sequence suitable for languages like C and C++.
▲ Show 20 Lines • Show All 303 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addFunctionSimplificationPasses(
}		}

// Speculative execution if the target has divergent branches; otherwise nop.		// Speculative execution if the target has divergent branches; otherwise nop.
MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());		MPM.add(createSpeculativeExecutionIfHasBranchDivergencePass());
MPM.add(createJumpThreadingPass()); // Thread jumps.		MPM.add(createJumpThreadingPass()); // Thread jumps.
MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals		MPM.add(createCorrelatedValuePropagationPass()); // Propagate conditionals
MPM.add(createCFGSimplificationPass()); // Merge & remove BBs		MPM.add(createCFGSimplificationPass()); // Merge & remove BBs
// Combine silly seq's		// Combine silly seq's
		MPM.add(createAggressiveInstCombinerPass());
addInstructionCombiningPass(MPM);		addInstructionCombiningPass(MPM);
if (SizeLevel == 0 && !DisableLibCallsShrinkWrap)		if (SizeLevel == 0 && !DisableLibCallsShrinkWrap)
MPM.add(createLibCallsShrinkWrapPass());		MPM.add(createLibCallsShrinkWrapPass());
addExtensionsToPM(EP_Peephole, MPM);		addExtensionsToPM(EP_Peephole, MPM);

// Optimize memory intrinsic calls based on the profiled size information.		// Optimize memory intrinsic calls based on the profiled size information.
if (SizeLevel == 0)		if (SizeLevel == 0)
MPM.add(createPGOMemOPSizeOptLegacyPass());		MPM.add(createPGOMemOPSizeOptLegacyPass());
▲ Show 20 Lines • Show All 427 Lines • ▼ Show 20 Lines	void PassManagerBuilder::addLTOOptimizationPasses(legacy::PassManagerBase &PM) {

// Remove unused arguments from functions.		// Remove unused arguments from functions.
PM.add(createDeadArgEliminationPass());		PM.add(createDeadArgEliminationPass());

// Reduce the code after globalopt and ipsccp. Both can open up significant		// Reduce the code after globalopt and ipsccp. Both can open up significant
// simplification opportunities, and both can propagate functions through		// simplification opportunities, and both can propagate functions through
// function pointers. When this happens, we often have to resolve varargs		// function pointers. When this happens, we often have to resolve varargs
// calls, etc, so let instcombine do this.		// calls, etc, so let instcombine do this.
		PM.add(createAggressiveInstCombinerPass());
addInstructionCombiningPass(PM);		addInstructionCombiningPass(PM);
addExtensionsToPM(EP_Peephole, PM);		addExtensionsToPM(EP_Peephole, PM);

// Inline small functions		// Inline small functions
bool RunInliner = Inliner;		bool RunInliner = Inliner;
if (RunInliner) {		if (RunInliner) {
PM.add(Inliner);		PM.add(Inliner);
Inliner = nullptr;		Inliner = nullptr;
▲ Show 20 Lines • Show All 240 Lines • Show Last 20 Lines

lib/Transforms/LLVMBuild.txt

	Show All 10 Lines
	;			;
	; For more information on the LLVMBuild system, please see:			; For more information on the LLVMBuild system, please see:
	;			;
	; http://llvm.org/docs/LLVMBuild.html			; http://llvm.org/docs/LLVMBuild.html
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[common]			[common]
	subdirectories = Coroutines IPO InstCombine Instrumentation Scalar Utils Vectorize ObjCARC			subdirectories = AggressiveInstCombine Coroutines IPO InstCombine Instrumentation Scalar Utils Vectorize ObjCARC

	[component_0]			[component_0]
	type = Group			type = Group
	name = Transforms			name = Transforms
	parent = Libraries			parent = Libraries

lib/Transforms/Scalar/LLVMBuild.txt

This file was copied to lib/Transforms/AggressiveInstCombine/LLVMBuild.txt.

	Show All 14 Lines
	;			;
	;===------------------------------------------------------------------------===;			;===------------------------------------------------------------------------===;

	[component_0]			[component_0]
	type = Library			type = Library
	name = Scalar			name = Scalar
	parent = Transforms			parent = Transforms
	library_name = ScalarOpts			library_name = ScalarOpts
	required_libraries = Analysis Core InstCombine Support TransformUtils			required_libraries = Analysis Core InstCombine AggressiveInstCombine Support TransformUtils

test/Other/new-pm-defaults.ll

	Show First 20 Lines • Show All 109 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SROA			; CHECK-O-NEXT: Running pass: SROA
	; CHECK-O-NEXT: Running pass: EarlyCSEPass			; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis			; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
	; CHECK-O-NEXT: Running pass: SpeculativeExecutionPass			; CHECK-O-NEXT: Running pass: SpeculativeExecutionPass
	; CHECK-O-NEXT: Running pass: JumpThreadingPass			; CHECK-O-NEXT: Running pass: JumpThreadingPass
	; CHECK-O-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
				; CHECK-O-NEXT: AggressiveInstCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-PEEPHOLE-NEXT: Running pass: NoOpFunctionPass
	; CHECK-O-NEXT: Running pass: TailCallElimPass			; CHECK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: ReassociatePass			; CHECK-O-NEXT: Running pass: ReassociatePass
	▲ Show 20 Lines • Show All 124 Lines • Show Last 20 Lines

test/Other/new-pm-lto-defaults.ll

	Show First 20 Lines • Show All 54 Lines • ▼ Show 20 Lines
	; CHECK-O2-NEXT: Running pass: GlobalOptPass			; CHECK-O2-NEXT: Running pass: GlobalOptPass
	; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PromotePass>			; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.*}}PromotePass>
	; CHECK-O2-NEXT: Running analysis: DominatorTreeAnalysis			; CHECK-O2-NEXT: Running analysis: DominatorTreeAnalysis
	; CHECK-O2-NEXT: Running analysis: AssumptionAnalysis			; CHECK-O2-NEXT: Running analysis: AssumptionAnalysis
	; CHECK-O2-NEXT: Running pass: ConstantMergePass			; CHECK-O2-NEXT: Running pass: ConstantMergePass
	; CHECK-O2-NEXT: Running pass: DeadArgumentEliminationPass			; CHECK-O2-NEXT: Running pass: DeadArgumentEliminationPass
	; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>			; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>
	; CHECK-O2-NEXT: Starting llvm::Function pass manager run.			; CHECK-O2-NEXT: Starting llvm::Function pass manager run.
				; CHECK-O2-NEXT: Running pass: AggressiveInstCombinePass
	; CHECK-O2-NEXT: Running pass: InstCombinePass			; CHECK-O2-NEXT: Running pass: InstCombinePass
	; CHECK-EP-Peephole-NEXT: Running pass: NoOpFunctionPass			; CHECK-EP-Peephole-NEXT: Running pass: NoOpFunctionPass
	; CHECK-O2-NEXT: Finished llvm::Function pass manager run.			; CHECK-O2-NEXT: Finished llvm::Function pass manager run.
	; CHECK-O2-NEXT: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}InlinerPass>			; CHECK-O2-NEXT: Running pass: ModuleToPostOrderCGSCCPassAdaptor<{{.*}}InlinerPass>
	; CHECK-O2-NEXT: Running pass: GlobalOptPass			; CHECK-O2-NEXT: Running pass: GlobalOptPass
	; CHECK-O2-NEXT: Running pass: GlobalDCEPass			; CHECK-O2-NEXT: Running pass: GlobalDCEPass
	; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>			; CHECK-O2-NEXT: Running pass: ModuleToFunctionPassAdaptor<{{.}}PassManager{{.}}>
	; CHECK-O2-NEXT: Starting llvm::Function pass manager run.			; CHECK-O2-NEXT: Starting llvm::Function pass manager run.
	▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

test/Other/new-pm-thinlto-defaults.ll

	Show First 20 Lines • Show All 104 Lines • ▼ Show 20 Lines
	; CHECK-O-NEXT: Running pass: SROA			; CHECK-O-NEXT: Running pass: SROA
	; CHECK-O-NEXT: Running pass: EarlyCSEPass			; CHECK-O-NEXT: Running pass: EarlyCSEPass
	; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis			; CHECK-O-NEXT: Running analysis: MemorySSAAnalysis
	; CHECK-O-NEXT: Running pass: SpeculativeExecutionPass			; CHECK-O-NEXT: Running pass: SpeculativeExecutionPass
	; CHECK-O-NEXT: Running pass: JumpThreadingPass			; CHECK-O-NEXT: Running pass: JumpThreadingPass
	; CHECK-O-NEXT: Running analysis: LazyValueAnalysis			; CHECK-O-NEXT: Running analysis: LazyValueAnalysis
	; CHECK-O-NEXT: Running pass: CorrelatedValuePropagationPass			; CHECK-O-NEXT: Running pass: CorrelatedValuePropagationPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
				; CHECK-O-NEXT: Running pass: AggressiveInstCombinePass
	; CHECK-O-NEXT: Running pass: InstCombinePass			; CHECK-O-NEXT: Running pass: InstCombinePass
	; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O1-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O2-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass			; CHECK-O3-NEXT: Running pass: LibCallsShrinkWrapPass
	; CHECK-O-NEXT: Running pass: TailCallElimPass			; CHECK-O-NEXT: Running pass: TailCallElimPass
	; CHECK-O-NEXT: Running pass: SimplifyCFGPass			; CHECK-O-NEXT: Running pass: SimplifyCFGPass
	; CHECK-O-NEXT: Running pass: ReassociatePass			; CHECK-O-NEXT: Running pass: ReassociatePass
	; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis			; CHECK-O-NEXT: Running pass: RequireAnalysisPass<{{.*}}OptimizationRemarkEmitterAnalysis
	▲ Show 20 Lines • Show All 118 Lines • Show Last 20 Lines

test/Transforms/AggressiveInstCombine/trunc_multi_uses.ll

This file was added.

				; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
				; RUN: opt < %s -aggressive-instcombine -S \| FileCheck %s
				zviUnsubmitted Not Done Reply Inline Actions Should there be negative tests for the vector cases that are not permitted to transform? zvi: Should there be negative tests for the vector cases that are not permitted to transform?
				aaboudAuthorUnsubmitted Not Done Reply Inline Actions There should be. However, such tests needs instructions such as lshr, ashr, udiv or urem, i.e., instructions that increase the MinBitWidth that we can truncate the expression to. So, I will add such test in the following patches, once I add support to these instructions. aaboud: There should be. However, such tests needs instructions such as lshr, ashr, udiv or urem, i.e.
				; RUN: opt < %s -passes=aggressive-instcombine -S \| FileCheck %s
				target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"

				; Aggressive Instcombine should be able to reduce width of these expressions.

				declare i32 @use32(i32)
				declare i32 @use64(i64)

				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
				;; These tests check cases where expression dag post-dominated by TruncInst
				;; contains instruction, which has more than one usage.
				;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

				define void @multi_uses_add(i32 %X) {
				; CHECK-LABEL: @multi_uses_add(
				; CHECK-NEXT: [[A1:%.]] = zext i32 [[X:%.]] to i64
				; CHECK-NEXT: [[B1:%.*]] = add i32 [[X]], 15
				; CHECK-NEXT: [[C1:%.*]] = mul i32 [[B1]], [[B1]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use32(i32 [[C1]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use64(i64 [[A1]])
				; CHECK-NEXT: ret void
				;
				%A1 = zext i32 %X to i64
				%B1 = add i64 %A1, 15
				%C1 = mul i64 %B1, %B1
				%T1 = trunc i64 %C1 to i32
				call i32 @use32(i32 %T1)
				; make sure zext have another use that is not post-dominated by the TruncInst.
				call i32 @use64(i64 %A1)
				ret void
				}

				define void @multi_uses_or(i32 %X) {
				; CHECK-LABEL: @multi_uses_or(
				; CHECK-NEXT: [[A1:%.]] = zext i32 [[X:%.]] to i64
				; CHECK-NEXT: [[B1:%.*]] = or i32 [[X]], 15
				; CHECK-NEXT: [[C1:%.*]] = mul i32 [[B1]], [[B1]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use32(i32 [[C1]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use64(i64 [[A1]])
				; CHECK-NEXT: ret void
				;
				%A1 = zext i32 %X to i64
				%B1 = or i64 %A1, 15
				%C1 = mul i64 %B1, %B1
				%T1 = trunc i64 %C1 to i32
				call i32 @use32(i32 %T1)
				; make sure zext have another use that is not post-dominated by the TruncInst.
				call i32 @use64(i64 %A1)
				ret void
				}

				define void @multi_uses_xor(i32 %X) {
				; CHECK-LABEL: @multi_uses_xor(
				; CHECK-NEXT: [[A1:%.]] = zext i32 [[X:%.]] to i64
				; CHECK-NEXT: [[B1:%.*]] = xor i32 [[X]], 15
				; CHECK-NEXT: [[C1:%.*]] = mul i32 [[B1]], [[B1]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use32(i32 [[C1]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use64(i64 [[A1]])
				; CHECK-NEXT: ret void
				;
				%A1 = zext i32 %X to i64
				%B1 = xor i64 %A1, 15
				%C1 = mul i64 %B1, %B1
				%T1 = trunc i64 %C1 to i32
				call i32 @use32(i32 %T1)
				; make sure zext have another use that is not post-dominated by the TruncInst.
				call i32 @use64(i64 %A1)
				ret void
				}

				define void @multi_uses_and(i32 %X) {
				; CHECK-LABEL: @multi_uses_and(
				; CHECK-NEXT: [[A1:%.]] = zext i32 [[X:%.]] to i64
				; CHECK-NEXT: [[B1:%.*]] = and i32 [[X]], 15
				; CHECK-NEXT: [[C1:%.*]] = mul i32 [[B1]], [[B1]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use32(i32 [[C1]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use64(i64 [[A1]])
				; CHECK-NEXT: ret void
				;
				%A1 = zext i32 %X to i64
				%B1 = and i64 %A1, 15
				%C1 = mul i64 %B1, %B1
				%T1 = trunc i64 %C1 to i32
				call i32 @use32(i32 %T1)
				; make sure zext have another use that is not post-dominated by the TruncInst.
				call i32 @use64(i64 %A1)
				ret void
				}

				define void @multi_uses_sub(i32 %X, i32 %Y) {
				; CHECK-LABEL: @multi_uses_sub(
				; CHECK-NEXT: [[A1:%.]] = zext i32 [[X:%.]] to i64
				; CHECK-NEXT: [[A2:%.]] = zext i32 [[Y:%.]] to i64
				; CHECK-NEXT: [[B1:%.*]] = sub i32 [[X]], [[Y]]
				; CHECK-NEXT: [[C1:%.*]] = mul i32 [[B1]], [[B1]]
				; CHECK-NEXT: [[TMP1:%.*]] = call i32 @use32(i32 [[C1]])
				; CHECK-NEXT: [[TMP2:%.*]] = call i32 @use64(i64 [[A1]])
				; CHECK-NEXT: [[TMP3:%.*]] = call i32 @use64(i64 [[A2]])
				; CHECK-NEXT: ret void
				;
				%A1 = zext i32 %X to i64
				%A2 = zext i32 %Y to i64
				%B1 = sub i64 %A1, %A2
				%C1 = mul i64 %B1, %B1
				%T1 = trunc i64 %C1 to i32
				call i32 @use32(i32 %T1)
				; make sure zext have another use that is not post-dominated by the TruncInst.
				call i32 @use64(i64 %A1)
				call i32 @use64(i64 %A2)
				ret void
				}

tools/opt/opt.cpp

Show First 20 Lines • Show All 381 Lines • ▼ Show 20 Lines	int main(int argc, char **argv) {
initializeCoroutines(Registry);		initializeCoroutines(Registry);
initializeScalarOpts(Registry);		initializeScalarOpts(Registry);
initializeObjCARCOpts(Registry);		initializeObjCARCOpts(Registry);
initializeVectorization(Registry);		initializeVectorization(Registry);
initializeIPO(Registry);		initializeIPO(Registry);
initializeAnalysis(Registry);		initializeAnalysis(Registry);
initializeTransformUtils(Registry);		initializeTransformUtils(Registry);
initializeInstCombine(Registry);		initializeInstCombine(Registry);
		initializeAggressiveInstCombinerLegacyPassPass(Registry);
initializeInstrumentation(Registry);		initializeInstrumentation(Registry);
initializeTarget(Registry);		initializeTarget(Registry);
// For codegen passes, only passes that do IR to IR transformation are		// For codegen passes, only passes that do IR to IR transformation are
// supported.		// supported.
initializeExpandMemCmpPassPass(Registry);		initializeExpandMemCmpPassPass(Registry);
initializeScalarizeMaskedMemIntrinPass(Registry);		initializeScalarizeMaskedMemIntrinPass(Registry);
initializeCodeGenPreparePass(Registry);		initializeCodeGenPreparePass(Registry);
initializeAtomicExpandPass(Registry);		initializeAtomicExpandPass(Registry);
▲ Show 20 Lines • Show All 396 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Introducing Aggressive Instruction Combine passClosedPublic

Details

Diff Detail

Event Timeline

default;

Revision Contents

Diff 123510

docs/Passes.rst

include/llvm/InitializePasses.h

include/llvm/Transforms/Scalar.h

lib/LTO/LLVMBuild.txt

lib/Passes/LLVMBuild.txt

lib/Passes/PassBuilder.cpp

lib/Passes/PassRegistry.def

lib/Transforms/AggressiveInstCombine/AggressiveInstCombine.cpp

lib/Transforms/AggressiveInstCombine/AggressiveInstCombineInternal.h

lib/Transforms/AggressiveInstCombine/CMakeLists.txt

lib/Transforms/AggressiveInstCombine/LLVMBuild.txt

lib/Transforms/AggressiveInstCombine/TruncInstCombine.cpp

lib/Transforms/CMakeLists.txt

lib/Transforms/IPO/LLVMBuild.txt

lib/Transforms/IPO/PassManagerBuilder.cpp

lib/Transforms/LLVMBuild.txt

lib/Transforms/Scalar/LLVMBuild.txt

test/Other/new-pm-defaults.ll

test/Other/new-pm-lto-defaults.ll

test/Other/new-pm-thinlto-defaults.ll

test/Transforms/AggressiveInstCombine/trunc_multi_uses.ll

tools/opt/opt.cpp

[InstCombine] Introducing Aggressive Instruction Combine pass
ClosedPublic