LGTM.
@Craig what do you think?
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Apr 23 2018
Feb 5 2018
Feb 1 2018
LGTM.
Thanks for taking care of this.
Jan 31 2018
Jan 30 2018
Jan 28 2018
In D42536#988422, @spatel wrote:That's too easy - instcombine gets that. I think you need at least one more intermediate binop to show a case that instcombine can't handle.
You are right, I forgot that instcombine algorithm can handle zext/sext with multi use in case they have same type as trunc instruction destination type.
Here is a better test suggestion:
Jan 25 2018
In D42536#988261, @spatel wrote:I apologize for not noticing this before, but do all of the tests here fail in regular instcombine because there's an instruction that uses the same value more than once? Or are there more complicated patterns?
Jan 24 2018
If there is no more concerns, can I get approval?
@craig.topper
Jan 23 2018
Moved the aggressive-inst-combine pass to run with -O3.
I prefer to make this change now, in order to get approval to commit the pass, and in the future, once the pass is complete, to argue enabling it with -O2, in a separate discussion.
Jan 14 2018
In D38313#971848, @mzolotukhin wrote:Hi and sorry for the late reply, I've just returned from the holidays break.
The numbers posted before look good. I wonder though if it would make sense to only run this pass on -O3. I assume that even if now the pass spends very little time, it will grow in future and the compile-time costs might become noticeable.Michael
Jan 6 2018
Hi,
I uploaded a new version about a week ago with the required change for not generating new vector type.
Please, let me know if you have any other comments.
Dec 26 2017
Dec 21 2017
Addressed Craig and Sanjay comments:
- Retrieve the support for vector types.
- Make sure that this transformation will not create a new vector type. This is achieved by allowing reducing expression with vector type only when MinBitWidth == TruncBitWidth.
Thanks for Zvi for helping me progress with this review while I am on vacation.
I will continue as an author from here.
Dec 19 2017
In D38313#959647, @craig.topper wrote:Taking your first example and increasing the element count to get legal types
define i16 @foo(<8 x i32> %X) { %A1 = zext <8 x i32> %X to <8 x i64> %B1 = mul <8 x i64> %A1, %A1 %C1 = extractelement <8 x i64> %B1, i32 0 %D1 = extractelement <8 x i64> %B1, i32 1 %E1 = add i64 %C1, %D1 %T = trunc i64 %E1 to i16 ret i16 %T } define i16 @bar(<8 x i32> %X) { %A2 = trunc <8 x i32> %X to <8 x i16> %B2 = mul <8 x i16> %A2, %A2 %C2 = extractelement <8 x i16> %B2, i32 0 %D2 = extractelement <8 x i16> %B2, i32 1 %T = add i16 %C2, %D2 ret i16 %T }Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.
Dec 18 2017
Thanks Zvi for addressing all comments and questions while I am away.
Craig, please, see answers for your questions inlined below.
Nov 2 2017
Minor typo update.
Minor fix, forgot to use IRBuilder in one case in the previous patch.
Addressed Zvi's Comments.
Thanks Zvi for the comments.
I will upload a new patch with most of the comments fixed.
See few answers below.
Nov 1 2017
Addressed Hal's comments.
Thanks Hal for the review.
I will update the patch with the changes you ask for.
Also, see one answer below.
Oct 30 2017
Separated the implementation from InstCombine pass and introduced a new pass called AggressiveInstCombine, which is called only twice (compared InstCombine, which is called ~6 times), one time as part of function simplification passes and second time as part of LTO optimization passes.
Oct 26 2017
Addressed Zvi's comments.
Thanks Zvi for the comments.
I fixed most of them and will upload a new patch soon.
Oct 25 2017
In D38313#907218, @spatel wrote:In D38313#906667, @aaboud wrote:To summarize, I really do not mind to move it to separate pass, but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.As I said, I'm deferring to others on the way forward, so if everyone else thinks this is good, then I'm not objecting. Others have looked at the code closer than me, so I'll let them provide more feedback and/or approval.
In D38313#906511, @spatel wrote:In D38313#906324, @aaboud wrote:Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.
Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.
I agree that testing must be done, but I don't see how that makes it obvious that this should be part of instcombine? If you're concerned that something else in instcombine will inhibit or invert this transform, you could add tests under test/Transforms/PhaseOrdering/ to make sure that doesn't happen. I think you've done the hard part (the code itself) already. :)
The major disadvantage of being in instcombine is that this code will be running 5-6 times in a typical pipeline when it probably doesn't need to.
Answered David's comment.
Updated patch according to Craig comment. (Fixed minor logical bug)
In D38313#903090, @spatel wrote:If I'm seeing this correctly, it's an independent pass within InstCombine. It sits outside InstCombine's iteration loop, so it doesn't interact with the rest of the pass. What is the advantage of this approach vs. making a standalone pass?
Oct 18 2017
Oct 15 2017
Addressed Craig and Zia comments.
Thanks Craig and Zia for the review and sorry for the late answer.
Please, see answers below.
Oct 3 2017
Do we have kill flags set? If so, we could move a use of a virtual register across its kill... But maybe there is some reason to assure that this is not the case.
Addressed Chandler comment.
Thanks Chandler for reviewing the code.
Oct 2 2017
In D38359#885162, @ormris wrote:This fixes the code differences I reported.
Called "initializeX86CmovConverterPassPass" also from the pass constructor.
Addressed Craig comment.
Sep 28 2017
Sep 27 2017
New approach was uploaded to D38313.
Fixed "no new line at end of file" issue.
Sep 25 2017
Sep 6 2017
Sep 5 2017
So, how do you wish to proceed from here?
Do you think that such optimization should be moved to separate new pass? Though it will be doing very similar thing as InstCombine, just will catch more cases that it does today?
Aug 30 2017
Aug 29 2017
In D37195#855376, @spatel wrote:I didn't look closely at the implementation, but extending instcombine with visitation/evaluation maps may cross the boundary of what instcombine should handle. Adding some more potential reviewers.
Fixed typo and naming according to Craig comments.
Thanks Craig for the comments.
Will upload an updated patch soon.
Aug 27 2017
fixed some typos.
Aug 25 2017
Aug 24 2017
These changes looks good to me.
Any more comments?
Aug 23 2017
Aug 22 2017
I have one comment below.
By the way, I noticed that the double AssertZero occur only for the x86_64 (in i386 it does not happen).
It might be worth checking where it comes from, regardless of this patch.
Aug 21 2017
Updated patch to address Craig's comment.
Aug 20 2017
Aug 19 2017
There is still an issue with this implementation, here another reproducer:
Aug 18 2017
In D36858#845606, @chandlerc wrote:While I'm writing the fix, and since it is already late in your TZ -- any other concerns before I land this?
Thanks Chandler for preparing the patch, the implementation looks elegant, however, it overlooked a case where the memory registers are a result of a previous CMOV instructions.
This is a small reproducer that result in bad MIR:
LGTM.
Aug 17 2017
After committing the other parts, this patch was reduced to handle only the multiply in ComputeNumSignBitsImpl.
Aug 16 2017
In D36784#843566, @spatel wrote:LGTM.
Just to make sure I'm understanding: the multiply patch (D36679) is independent of this?
The code looks good, just few comments need to be fixed/added (see below)