Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Phabricator shutdown timeline

aaboud (Amjad Aboud)
User

Projects

User does not belong to any projects.

User Details

User Since
Dec 4 2014, 6:14 AM (459 w, 6 d)

Recent Activity

Apr 23 2018

aaboud added a comment to D45938: Improves x86 interrupt error message on argument type error.

LGTM.
@Craig what do you think?

Apr 23 2018, 3:37 AM
aaboud updated subscribers of D45938: Improves x86 interrupt error message on argument type error.
Apr 23 2018, 3:37 AM

Feb 5 2018

aaboud added inline comments to D42739: [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst.
Feb 5 2018, 7:41 AM

Feb 1 2018

aaboud accepted D42739: [InstCombine] allow multi-use values in canEvaluate* if all uses are in 1 inst.

LGTM.
Thanks for taking care of this.

Feb 1 2018, 6:38 AM

Jan 31 2018

aaboud committed rL323926: [AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node….
[AggressiveInstCombine] Fixed TruncCombine class to handle TruncInst leaf node…
Jan 31 2018, 2:40 PM
aaboud closed D42622: [AggressiveInstCombine] fixed PR36121.
Jan 31 2018, 2:40 PM
aaboud committed rL323862: [AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks..
[AggressiveInstCombine] Make TruncCombine class ignore unreachable basic blocks.
Jan 31 2018, 2:44 AM
aaboud closed D42683: [AggressiveInstCombine] fixed PR36134.
Jan 31 2018, 2:44 AM
aaboud added inline comments to D42622: [AggressiveInstCombine] fixed PR36121.
Jan 31 2018, 1:14 AM

Jan 30 2018

aaboud created D42683: [AggressiveInstCombine] fixed PR36134.
Jan 30 2018, 4:12 AM

Jan 28 2018

aaboud added a comment to D42536: [AggresiveInstCombine] Added support of select and ShuffleVector to TruncInstCombine expression pattern.

That's too easy - instcombine gets that. I think you need at least one more intermediate binop to show a case that instcombine can't handle.

You are right, I forgot that instcombine algorithm can handle zext/sext with multi use in case they have same type as trunc instruction destination type.
Here is a better test suggestion:

Jan 28 2018, 7:38 AM
aaboud created D42622: [AggressiveInstCombine] fixed PR36121.
Jan 28 2018, 7:18 AM

Jan 25 2018

aaboud added a comment to D42536: [AggresiveInstCombine] Added support of select and ShuffleVector to TruncInstCombine expression pattern.

I apologize for not noticing this before, but do all of the tests here fail in regular instcombine because there's an instruction that uses the same value more than once? Or are there more complicated patterns?

Jan 25 2018, 11:27 AM
aaboud created D42536: [AggresiveInstCombine] Added support of select and ShuffleVector to TruncInstCombine expression pattern.
Jan 25 2018, 7:06 AM
aaboud committed rL323416: Another try to commit 323321 (aggressive instruction combine)..
Another try to commit 323321 (aggressive instruction combine).
Jan 25 2018, 4:08 AM

Jan 24 2018

aaboud committed rL323326: Reverted 323321..
Reverted 323321.
Jan 24 2018, 6:50 AM
aaboud added a reverting change for rL323321: [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive…: rL323326: Reverted 323321..
Jan 24 2018, 6:50 AM
aaboud committed rL323321: [InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive….
[InstCombine] Introducing Aggressive Instruction Combine pass (-aggressive…
Jan 24 2018, 4:44 AM
aaboud closed D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Jan 24 2018, 4:44 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

If there is no more concerns, can I get approval?
@craig.topper

Jan 24 2018, 1:32 AM

Jan 23 2018

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Moved the aggressive-inst-combine pass to run with -O3.
I prefer to make this change now, in order to get approval to commit the pass, and in the future, once the pass is complete, to argue enabling it with -O2, in a separate discussion.

Jan 23 2018, 6:13 AM

Jan 14 2018

aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Hi and sorry for the late reply, I've just returned from the holidays break.
The numbers posted before look good. I wonder though if it would make sense to only run this pass on -O3. I assume that even if now the pass spends very little time, it will grow in future and the compile-time costs might become noticeable.

Michael

Jan 14 2018, 6:59 AM

Jan 6 2018

aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Hi,
I uploaded a new version about a week ago with the required change for not generating new vector type.
Please, let me know if you have any other comments.

Jan 6 2018, 3:39 AM

Dec 26 2017

aaboud added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Dec 26 2017, 12:24 AM

Dec 21 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Addressed Craig and Sanjay comments:

  1. Retrieve the support for vector types.
  2. Make sure that this transformation will not create a new vector type. This is achieved by allowing reducing expression with vector type only when MinBitWidth == TruncBitWidth.
Dec 21 2017, 2:05 PM
aaboud commandeered D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks for Zvi for helping me progress with this review while I am on vacation.
I will continue as an author from here.

Dec 21 2017, 2:02 PM

Dec 19 2017

aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Taking your first example and increasing the element count to get legal types

define i16 @foo(<8 x i32> %X) {
  %A1 = zext <8 x i32> %X to <8 x i64>
  %B1 = mul <8 x i64> %A1, %A1
  %C1 = extractelement <8 x i64> %B1, i32 0
  %D1 = extractelement <8 x i64> %B1, i32 1
  %E1 = add i64 %C1, %D1
  %T = trunc i64 %E1 to i16
  ret i16 %T
}

define i16 @bar(<8 x i32> %X) {
  %A2 = trunc <8 x i32> %X to <8 x i16>
  %B2 = mul <8 x i16> %A2, %A2
  %C2 = extractelement <8 x i16> %B2, i32 0
  %D2 = extractelement <8 x i16> %B2, i32 1
  %T = add i16 %C2, %D2
  ret i16 %T
}

Then running that through llc with avx2. I get worse code for bar than foo. Vector truncates on x86 aren't good. There is no truncate instruction until avx512 and even then its 2 uops.

Dec 19 2017, 11:14 PM
aaboud added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Dec 19 2017, 4:47 AM

Dec 18 2017

aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks Zvi for addressing all comments and questions while I am away.
Craig, please, see answers for your questions inlined below.

Dec 18 2017, 6:18 AM

Nov 2 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Minor typo update.

Nov 2 2017, 9:30 AM
aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Minor fix, forgot to use IRBuilder in one case in the previous patch.

Nov 2 2017, 8:30 AM
aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Addressed Zvi's Comments.

Nov 2 2017, 8:16 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks Zvi for the comments.
I will upload a new patch with most of the comments fixed.
See few answers below.

Nov 2 2017, 8:03 AM

Nov 1 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Addressed Hal's comments.

Nov 1 2017, 10:53 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks Hal for the review.
I will update the patch with the changes you ask for.
Also, see one answer below.

Nov 1 2017, 10:50 AM

Oct 30 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Separated the implementation from InstCombine pass and introduced a new pass called AggressiveInstCombine, which is called only twice (compared InstCombine, which is called ~6 times), one time as part of function simplification passes and second time as part of LTO optimization passes.

Oct 30 2017, 7:12 AM

Oct 26 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Addressed Zvi's comments.

Oct 26 2017, 7:51 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks Zvi for the comments.
I fixed most of them and will upload a new patch soon.

Oct 26 2017, 7:45 AM

Oct 25 2017

aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

To summarize, I really do not mind to move it to separate pass, but I would like to make this optimization committed as soon as possible.
I appreciate your review and direction, please, advice with the best way you think I should implement this optimization.

As I said, I'm deferring to others on the way forward, so if everyone else thinks this is good, then I'm not objecting. Others have looked at the code closer than me, so I'll let them provide more feedback and/or approval.

Oct 25 2017, 3:58 PM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Saying that, I believe that running all current instcombine tests with this new functionality is a must, in order to make that possible, it is obvious that we need to be part of instcombine pass.

Note that the implementation is done in a way that moving it to a separate pass can be done with zero effort, but in a cost of ignoring/dropping few hundreds of LIT tests.

I agree that testing must be done, but I don't see how that makes it obvious that this should be part of instcombine? If you're concerned that something else in instcombine will inhibit or invert this transform, you could add tests under test/Transforms/PhaseOrdering/ to make sure that doesn't happen. I think you've done the hard part (the code itself) already. :)

The major disadvantage of being in instcombine is that this code will be running 5-6 times in a typical pipeline when it probably doesn't need to.

Oct 25 2017, 8:23 AM
aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Answered David's comment.

Oct 25 2017, 8:10 AM
aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Updated patch according to Craig comment. (Fixed minor logical bug)

Oct 25 2017, 2:36 AM
aaboud updated subscribers of D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Oct 25 2017, 2:25 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

If I'm seeing this correctly, it's an independent pass within InstCombine. It sits outside InstCombine's iteration loop, so it doesn't interact with the rest of the pass. What is the advantage of this approach vs. making a standalone pass?

Oct 25 2017, 2:24 AM

Oct 18 2017

aaboud added inline comments to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Oct 18 2017, 8:23 AM

Oct 15 2017

aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Addressed Craig and Zia comments.

Oct 15 2017, 5:55 AM
aaboud added a comment to D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Thanks Craig and Zia for the review and sorry for the late answer.
Please, see answers below.

Oct 15 2017, 5:55 AM
aaboud committed rL315851: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve….
[X86] Ignore DBG instructions in X86CmovConversion optimization to resolve…
Oct 15 2017, 4:01 AM
aaboud closed D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565 by committing rL315851: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve….
Oct 15 2017, 4:01 AM

Oct 3 2017

aaboud added a comment to D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565.

Do we have kill flags set? If so, we could move a use of a virtual register across its kill... But maybe there is some reason to assure that this is not the case.

Oct 3 2017, 1:41 AM
aaboud updated the diff for D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565.

Addressed Chandler comment.

Oct 3 2017, 12:21 AM
aaboud added a comment to D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565.

Thanks Chandler for reviewing the code.

Oct 3 2017, 12:20 AM

Oct 2 2017

aaboud committed rL314726: [X86][NFC] Add X86CmovConverterPass to the pass registry..
[X86][NFC] Add X86CmovConverterPass to the pass registry.
Oct 2 2017, 2:48 PM
aaboud closed D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry by committing rL314726: [X86][NFC] Add X86CmovConverterPass to the pass registry..
Oct 2 2017, 2:48 PM
aaboud added a comment to D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565.

This fixes the code differences I reported.

Oct 2 2017, 2:21 PM
aaboud added inline comments to D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry.
Oct 2 2017, 10:10 AM
aaboud updated the diff for D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry.

Called "initializeX86CmovConverterPassPass" also from the pass constructor.

Oct 2 2017, 10:10 AM
aaboud updated the diff for D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry.

Addressed Craig comment.

Oct 2 2017, 10:03 AM
aaboud added inline comments to D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry.
Oct 2 2017, 10:02 AM

Sep 28 2017

aaboud created D38359: [X86] Ignore DBG instructions in X86CmovConversion optimization to resolve PR34565.
Sep 28 2017, 7:34 AM
aaboud created D38355: [X86][NFC] Add X86CmovConverterPass to the pass registry.
Sep 28 2017, 5:27 AM

Sep 27 2017

aaboud abandoned D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

New approach was uploaded to D38313.

Sep 27 2017, 6:23 AM
aaboud updated the diff for D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.

Fixed "no new line at end of file" issue.

Sep 27 2017, 5:21 AM
aaboud created D38313: [InstCombine] Introducing Aggressive Instruction Combine pass.
Sep 27 2017, 5:19 AM

Sep 25 2017

aaboud added inline comments to D37251: [X86] Add a pass to convert instruction chains between domains.
Sep 25 2017, 5:02 AM

Sep 6 2017

aaboud added inline comments to D37504: [x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register..
Sep 6 2017, 2:13 AM
aaboud added inline comments to D37504: [x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register..
Sep 6 2017, 1:53 AM
aaboud updated subscribers of D37504: [x86] Fix PR34377 by disabling cmov conversion when we relied on it performing a zext of a register..
Sep 6 2017, 1:52 AM

Sep 5 2017

aaboud added a comment to D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

So, how do you wish to proceed from here?
Do you think that such optimization should be moved to separate new pass? Though it will be doing very similar thing as InstCombine, just will catch more cases that it does today?

Sep 5 2017, 1:33 AM

Aug 30 2017

aaboud added inline comments to D37289: [X86] Speculatively load operands of select instruction.
Aug 30 2017, 3:30 PM

Aug 29 2017

aaboud added inline comments to D37251: [X86] Add a pass to convert instruction chains between domains.
Aug 29 2017, 7:54 AM
aaboud added a comment to D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

I didn't look closely at the implementation, but extending instcombine with visitation/evaluation maps may cross the boundary of what instcombine should handle. Adding some more potential reviewers.

Aug 29 2017, 6:17 AM
aaboud updated the diff for D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

Fixed typo and naming according to Craig comments.

Aug 29 2017, 3:41 AM
aaboud added a comment to D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

Thanks Craig for the comments.
Will upload an updated patch soon.

Aug 29 2017, 3:41 AM

Aug 27 2017

aaboud updated the diff for D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..

fixed some typos.

Aug 27 2017, 8:32 AM
aaboud created D37195: [InstCombine] Teach canEvaluateTruncated and EvaluateInDifferentType to handle expression tree with multi-used nodes..
Aug 27 2017, 8:30 AM

Aug 25 2017

aaboud committed rL311773: [InstCombine] Consider more cases where SimplifyDemandedUseBits does not….
[InstCombine] Consider more cases where SimplifyDemandedUseBits does not…
Aug 25 2017, 4:09 AM
aaboud closed D36936: [InstCombine] Consider more cases where SimplifyDemandedUseBits do not converting AShr to LShr by committing rL311773: [InstCombine] Consider more cases where SimplifyDemandedUseBits does not….
Aug 25 2017, 4:09 AM

Aug 24 2017

aaboud accepted D36840: [DAG] convert vector select-of-constants to logic/math.
Aug 24 2017, 3:25 PM
aaboud added a comment to D36840: [DAG] convert vector select-of-constants to logic/math.

These changes looks good to me.

Aug 24 2017, 3:13 PM
aaboud added a comment to D36936: [InstCombine] Consider more cases where SimplifyDemandedUseBits do not converting AShr to LShr .

Any more comments?

Aug 24 2017, 8:57 AM
aaboud added inline comments to D37069: [x86] use the IR type of formal args to create assertzext/assertsext and scalar truncate nodes.
Aug 24 2017, 8:19 AM
aaboud added inline comments to D37069: [x86] use the IR type of formal args to create assertzext/assertsext and scalar truncate nodes.
Aug 24 2017, 5:08 AM

Aug 23 2017

aaboud added inline comments to D37069: [x86] use the IR type of formal args to create assertzext/assertsext and scalar truncate nodes.
Aug 23 2017, 1:39 PM

Aug 22 2017

aaboud added a comment to D37017: [DAGCombiner] fold assertzexts separated by trunc.

I have one comment below.
By the way, I noticed that the double AssertZero occur only for the x86_64 (in i386 it does not happen).
It might be worth checking where it comes from, regardless of this patch.

Aug 22 2017, 2:14 PM

Aug 21 2017

aaboud updated the diff for D36936: [InstCombine] Consider more cases where SimplifyDemandedUseBits do not converting AShr to LShr .

Updated patch to address Craig's comment.

Aug 21 2017, 4:28 AM

Aug 20 2017

aaboud created D36936: [InstCombine] Consider more cases where SimplifyDemandedUseBits do not converting AShr to LShr .
Aug 20 2017, 11:30 AM

Aug 19 2017

aaboud added a comment to D36858: [x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow..

There is still an issue with this implementation, here another reproducer:

Aug 19 2017, 11:55 AM

Aug 18 2017

aaboud committed rL311206: [InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply….
[InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply…
Aug 18 2017, 3:58 PM
aaboud closed D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)) by committing rL311206: [InstCombine] Teach ComputeNumSignBitsImpl to handle integer multiply….
Aug 18 2017, 3:58 PM
aaboud added a comment to D36858: [x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow..

While I'm writing the fix, and since it is already late in your TZ -- any other concerns before I land this?

Aug 18 2017, 10:39 AM
aaboud added a comment to D36858: [x86] Teach the cmov converter to aggressively convert cmovs with memory operands into control flow..

Thanks Chandler for preparing the patch, the implementation looks elegant, however, it overlooked a case where the memory registers are a result of a previous CMOV instructions.
This is a small reproducer that result in bad MIR:

Aug 18 2017, 6:01 AM
aaboud accepted D36783: [x86] Refactor the CMOV conversion pass to be more flexible..

LGTM.

Aug 18 2017, 3:30 AM

Aug 17 2017

aaboud updated the diff for D36679: [InstCombine] Added support for: trunc(ashr(mul(sext(...), sext(...))) -> ashr(mul(...)).

After committing the other parts, this patch was reduced to handle only the multiply in ComputeNumSignBitsImpl.

Aug 17 2017, 7:19 AM
aaboud committed rL311082: [X86] Refactoring of X86TargetLowering::EmitLoweredSelect. NFC..
[X86] Refactoring of X86TargetLowering::EmitLoweredSelect. NFC.
Aug 17 2017, 5:14 AM
aaboud closed D35685: [X86] Refactoring of X86TargetLowering::EmitLoweredSelect. nfc. by committing rL311082: [X86] Refactoring of X86TargetLowering::EmitLoweredSelect. NFC..
Aug 17 2017, 5:14 AM

Aug 16 2017

aaboud committed rL311050: [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including….
[InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including…
Aug 16 2017, 3:43 PM
aaboud closed D36784: [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including those with vector splat shift amount) by committing rL311050: [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including….
Aug 16 2017, 3:43 PM
aaboud added a comment to D36784: [InstCombine] Teach canEvaluateTruncated to handle arithmetic shift (including those with vector splat shift amount).

LGTM.

Just to make sure I'm understanding: the multiply patch (D36679) is independent of this?

Aug 16 2017, 2:40 PM
aaboud added a comment to D36783: [x86] Refactor the CMOV conversion pass to be more flexible..

The code looks good, just few comments need to be fixed/added (see below)

Aug 16 2017, 7:22 AM