Page MenuHomePhabricator

opaparo (Omer Paparo Bivas)
User

Projects

User does not belong to any projects.

User Details

User Since
May 4 2016, 4:18 AM (222 w, 3 d)

Recent Activity

Oct 28 2018

opaparo accepted D53767: [X86] Convert MIR test to IR test.

Unfortunately, the change that this test was designated to test never matured into a patch, so it was left as an orphan.
For the time being, you can even delete it altogether.

Oct 28 2018, 1:14 AM

Aug 31 2018

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

The new pass is a kind of preparation pass for InstCombine. It complements changes introduced at D39421, as well as other InstCombine transformations. Checkout my lit tests, they illustrate several scenarios.
I believe that this pass will open a gate to other reassociations required by InstCombine transformations, reassociations that can't be expressed in means of the existing Reassociate pass.

@davide, does that answer your questions? Do you have any additional concerns?

Aug 31 2018, 7:16 AM
opaparo updated the diff for D41574: [Transforms] Adding a WeakReassociate pass.

Rebased and added lit tests

Aug 31 2018, 7:14 AM

Jul 10 2018

opaparo committed rC336658: Fixing builtin __atomic_fetch_min declaration.
Fixing builtin __atomic_fetch_min declaration
Jul 10 2018, 5:09 AM
opaparo committed rL336658: Fixing builtin __atomic_fetch_min declaration.
Fixing builtin __atomic_fetch_min declaration
Jul 10 2018, 5:09 AM
opaparo closed D49068: Fixing builtin __atomic_fetch_min definition.
Jul 10 2018, 5:09 AM
opaparo closed D49068: Fixing builtin __atomic_fetch_min definition.
Jul 10 2018, 5:09 AM

Jul 9 2018

opaparo created D49068: Fixing builtin __atomic_fetch_min definition.
Jul 9 2018, 4:13 AM

Jun 17 2018

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

Ping.

Jun 17 2018, 8:17 AM
opaparo updated the diff for D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.

Rebasing the patch and adjusting it to the changes made in rL333689.
Removed special code for CastInst in InstructionSimplify::SimplifyWithOpReplaced as rL333689's changes handle that (in a different way).

Jun 17 2018, 8:14 AM

May 28 2018

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

What are benchmarks results with this pass (compile-time/performance)? Do we really need it at all optlevels, or can we only include it at -O3?

May 28 2018, 9:38 AM
opaparo updated the diff for D41574: [Transforms] Adding a WeakReassociate pass.

Several changes in this revision:

May 28 2018, 9:16 AM
opaparo accepted D47163: [InstCombine] don't change the size of a select if it would mismatch its condition operands' sizes.

@opaparo can you review this patch? I'm wondering if this will help any of the motivating cases for D46380.

May 28 2018, 9:06 AM

May 22 2018

opaparo updated the diff for D41574: [Transforms] Adding a WeakReassociate pass.

Following this discussion, rebasing the patch to trunk.

May 22 2018, 9:22 AM
opaparo updated the diff for D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.

Changing 'dyn_cast_or_null' to 'dyn_cast' in two places where the operator cannot be null.

May 22 2018, 7:44 AM
opaparo added a comment to D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
  1. This is an instsimplify patch - the review title should be changed; there should be minimal tests for each transform under tests/Transforms/InstSimplify.

The title was changed. However, those changes do not affect InstSimplify directly, as they are not used by this pass. The only functional change is in InstCombine, which uses these parts of InstSimplify as a library.

I don't understand this statement. The proposal affects SimplifyWithOpReplaced() which is only called from within instsimplify (simplifySelectWithICmpCond), so every test should be visible using only -instsimplify. The first test should be reduced to something like this:

define i64 @sel_false_val_is_a_masked_shl_of_true_val1(i32 %x) {
  %x15 = and i32 %x, 15
  %sh = shl nuw nsw i32 %x15, 2
  %z = zext i32 %sh to i64
  %cmp = icmp eq i32 %x15, 0
  %r = select i1 %cmp, i64 0, i64 %z
  ret i64 %r
}

You're right. I've originally missed the direct effect on InstSimplify, and I will change the tests location and structure if this change is accepted, but please also note that lib/Transforms/InstCombine/InstCombineSelect.cpp calls SimplifySelectInst which calls simplifySelectWithICmpCond which in turn calls SimplifyWithOpReplaced. Hence, InstCombine also benefits from this functional change.

But this example doesn't need nsw/nuw, so what is this test trying to demonstrate?
https://rise4fun.com/Alive/9zX

Exactly that. In the current status, InstCombine and InstSimplify will not perform this transformation despite it's correctness. Furthermore, if you remove the nsw/nuw flags from the 'shl' in this example, InstCombine will first identify that the 'shl' really has no signed and unsigned wraps and will add the flags before the 'select' had the chance to optimize, which will result in the same un-transformed code. This is exactly one of the missed transformation opportunities I'm trying to cease.

Also, this is still using ValueTracking, so I'm not comfortable continuing this review until we have more data about the cost and benefits.

You're right that ValueTracking is still used, but running this change on a large set of benchmark did not yield any noticeable compile-time regressions. In addition, the two functions from ValueTracking that I'm using (ComputeNumSignBits and MaskedValueIsZero) are already in use in InstructionSimplify.cpp.

May 22 2018, 7:42 AM

May 14 2018

opaparo added a comment to D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
  1. This is an instsimplify patch - the review title should be changed; there should be minimal tests for each transform under tests/Transforms/InstSimplify.

The title was changed. However, those changes do not affect InstSimplify directly, as they are not used by this pass. The only functional change is in InstCombine, which uses these parts of InstSimplify as a library.

  1. This is using ValueTracking which can be expensive in compile-time - do you have any stats for how often this fires, perf improvements, compile-time regressions? (adding Eli for thoughts about where we would draw that line because I really don't know)

After a close look I discovered some compile-time regressions, as you predicted. I removed the safety checks for 'add', 'sub' and 'mul', which require ValueTracking (so now there are only safety checks for 'shl', 'ashr' and 'lshr'). That elimintated those regressions.

May 14 2018, 8:40 AM
opaparo updated the diff for D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.

Removing safety checks for 'add', 'sub' and 'mul', since they require ValueTracking which proved to be too expensive in compile time.

May 14 2018, 8:35 AM
opaparo retitled D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr' from [InstCombine] Extending InstructionSimplify to check OverflowingBinaryOperators and PossiblyExactOperators safety to [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
May 14 2018, 8:32 AM

May 10 2018

opaparo committed rL332026: [InstCombine] Moving overflow computation logic from InstCombine to….
[InstCombine] Moving overflow computation logic from InstCombine to…
May 10 2018, 12:50 PM
opaparo closed D46704: [NFC][InstCombine] Moving overflow computation logic from InstCombine to ValueTracking.
May 10 2018, 12:49 PM
opaparo added a comment to D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.

Moving the functions from instcombine to valuetracking is NFC, right? If so, can you do that now, so this patch is minimized? The similar functions should be in one place already.

May 10 2018, 10:16 AM
opaparo updated the diff for D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
May 10 2018, 10:15 AM
opaparo created D46704: [NFC][InstCombine] Moving overflow computation logic from InstCombine to ValueTracking.
May 10 2018, 10:13 AM
opaparo added a comment to D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.

Ping.

May 10 2018, 8:57 AM

May 3 2018

opaparo updated the summary of D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
May 3 2018, 12:05 AM

May 2 2018

opaparo abandoned D41353: [InstCombine] Adjusting bswap pattern to the new masked shl canonization.

The required changes were made in D45731

May 2 2018, 11:47 PM
opaparo created D46380: [InstSimplify] Adding safety checks for 'shl', 'ashr' and 'lshr'.
May 2 2018, 11:44 PM

May 1 2018

opaparo committed rL331265: [InstCombine] new testcases for OverflowingBinaryOperators and….
[InstCombine] new testcases for OverflowingBinaryOperators and…
May 1 2018, 7:30 AM
opaparo committed rL331257: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case.
[InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case
May 1 2018, 5:29 AM
opaparo closed D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case.
May 1 2018, 5:29 AM

Apr 27 2018

opaparo added a comment to D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case.

Thanks for coming back to this. I think this patch is fine as one more match of a potential IR variant, but...
Since the time we started the discussion in D41353, there was a proposal to improve matching for ctpop and ctlz in D45173. I'm curious what others think about the fact that we match bswap/bitreverse in regular instcombine. Would it be better to match all of those intrinsics in the same place? Do we distinguish these based on how often we expect them to occur?

Apr 27 2018, 6:58 AM
opaparo added a comment to D45842: [Reassociate] swap binop operands to increase factoring potential.

What are your plans for deeper associate-operand trees, such as in the tests I've suggested at D41574? Will you support such cases?

Apr 27 2018, 6:56 AM · Restricted Project

Apr 19 2018

opaparo commandeered D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case.

Restoring the ownership of this revision.

Apr 19 2018, 11:48 AM
opaparo set the repository for D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case to rL LLVM.
Apr 19 2018, 11:42 AM

Apr 17 2018

opaparo created D45731: [InstCombine] Adjusting bswap pattern matching to hold for And/Shift mixed case.
Apr 17 2018, 11:44 AM

Jan 23 2018

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

Ping

Jan 23 2018, 6:53 AM
opaparo added a comment to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Ping

Jan 23 2018, 6:53 AM
opaparo added a comment to D41354: [InstCombine] Extending InstructionSimplify.

Ping

Jan 23 2018, 6:50 AM

Jan 18 2018

opaparo abandoned D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).
Jan 18 2018, 3:02 AM

Jan 10 2018

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

The new pass is a kind of preparation pass for InstCombine. It complements changes introduced at D39421, as well as other InstCombine transformations. Checkout my lit tests, they illustrate several scenarios.
I believe that this pass will open a gate to other reassociations required by InstCombine transformations, reassociations that can't be expressed in means of the existing Reassociate pass.

Jan 10 2018, 9:47 AM
opaparo added a comment to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Ping

Jan 10 2018, 9:44 AM
opaparo added a comment to D41354: [InstCombine] Extending InstructionSimplify.

Ping

Jan 10 2018, 9:44 AM
opaparo added a comment to D41233: [InstCombine] Canonizing 'and' before 'shl'.

Ping

Jan 10 2018, 9:43 AM

Dec 27 2017

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Although both of the passes perform reassociation, they are essentially very different.
The Reassociate pass aggressively modifies the topography of the expression tree, basically rewriting it, while the WeakReassociate pass uses the existing topography.
The Reassociate pass creates new instructions while the WeakReassociate pass does not.
Moreover, their method of reassociation is completely different. To modify the existing Reassociate pass to supports reassociation of the form (a_0 + a_1) + (a_2 + a_3) -> (a_0 + a_2) + (a_1 + a_3) will take a great amount of changes, changes that will almost certainly collide with existing behavior and test cases. For example, how will it decide whether (a_0 + a_1) + (a_2 + a_3) will be transformed into a_0 + (a_1 + (a_2 + a_3)) like its current behavior or to (a_0 + a_2) + (a_1 + a_3) like the new behavior?
The existing Reassociate pass lacks, by design, several kinds of reassociations, kinds that I believe can find a home in the new pass.
IMO, maintaining one big pass that has two distinguished purposes can prove to be a greater burden than maintaining two smaller passes, each dedicated to its own purpose.

(Only trying to understand) From what I see the main consumers of reassociation are Global Value Numbering to discover more equivalences and instruction combining.
The former can probably embed reassociation while performing equivalence finding (in fact, there are algorithms where this happens), the latter already performs a fair amount of "canonicalization" (sometimes, too much). I wonder where your pass fits/what's the use case? Do you plan to enable it as part of the default pipeline?

Thanks,

Davide

Dec 27 2017, 7:23 AM

Dec 26 2017

opaparo added a comment to D41574: [Transforms] Adding a WeakReassociate pass.

It's not entirely clear to me why you need another reassociate pass.
We should really try to share as much code as possible with the existing one instead of adding maintenance burden.

Dec 26 2017, 5:07 AM
opaparo updated the diff for D41574: [Transforms] Adding a WeakReassociate pass.

Now using the constant's value rather than its pointer value for enforcing total order on LeafClasses. This is needed in order to ensure deterministic output of the compiler (I actually got different results when running with different OSs before this change).

Dec 26 2017, 12:42 AM

Dec 25 2017

opaparo created D41574: [Transforms] Adding a WeakReassociate pass.
Dec 25 2017, 5:26 AM

Dec 21 2017

opaparo updated the diff for D41354: [InstCombine] Extending InstructionSimplify.
  1. Generalized InstructionSimplify's IsSafeOverflowingBinaryOperator to handle 'add', 'sub' and 'mul' in addition to 'shl'. This was done by refactoring existing behavior out of InstCombine's internals into ValueTracking, so now both InstCombine and InstructionSimplify use it.
  2. Added tests to cover possible cases. For each OverflowingBinaryOperator ('shl', 'add', 'sub' and 'mul') I've created tests according to the Cartesian product of four factors: the op has the nuw flag, the op has the nsw flag, there actually is an unsigned overflow, there actually is a signed overflow. The key idea is to check that optimizations occur if and only if the overflow flag is absent or no overflow is guaranteed. also note that the file name had changed from select-bitext-bitwise-ops.ll to select-obo-peo-ops.ll to better reflect the nature of the tests it contains.
Dec 21 2017, 8:03 AM

Dec 19 2017

opaparo added inline comments to D41233: [InstCombine] Canonizing 'and' before 'shl'.
Dec 19 2017, 6:53 AM
opaparo updated the diff for D41353: [InstCombine] Adjusting bswap pattern to the new masked shl canonization.

Added a comment explaining the change of pattern in the bswap idiom detection.

Dec 19 2017, 5:07 AM

Dec 18 2017

opaparo updated the diff for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Rebasing on top of the new parent review D41354

Dec 18 2017, 8:01 AM
opaparo created D41354: [InstCombine] Extending InstructionSimplify.
Dec 18 2017, 7:59 AM
opaparo created D41353: [InstCombine] Adjusting bswap pattern to the new masked shl canonization.
Dec 18 2017, 7:48 AM
opaparo updated the diff for D41233: [InstCombine] Canonizing 'and' before 'shl'.
  1. Relaxing the canonization condition: the masked shl will be canonized to the new form only if the 'shl''s 0th operand is not a shift instruction. This is due to other, better optimization that will be able to kick in.
  2. Reverted the InstructionSimplify and bswap changes. Those will be added in two different reviews.
Dec 18 2017, 7:40 AM

Dec 14 2017

opaparo added a comment to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Both this and D38037 are trying to start a pattern match with an 'or', but I'm curious if there's a 'trunc' in the larger source that creates these patterns? Either way, we're missing something bigger than patterns that start with 'or'.

For example, I was looking at PR31667:
https://bugs.llvm.org/show_bug.cgi?id=31667

Name: sub_mask_shift

%and1 = lshr i32 %x, 3
%shr1 = and i32 %and1, 8191
%and2 = lshr i32 %x, 1
%shr2 = and i32 %and2, 32767
%r = sub i32 %shr1, %shr2

...which was filed as a backend bug, but we wouldn't handle that in IR either:
https://rise4fun.com/Alive/id4

So I think there's some more general sequence that we want to capture and optimize, but it may be difficult to justify as part of instcombine?

Note that there is a proposal for a new pass where all of these might find a home:
D38313

Dec 14 2017, 6:12 AM
opaparo abandoned D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.

These changes will be handled by D41233, D39421 and perhaps a few more future changes.

Dec 14 2017, 6:06 AM
opaparo updated the diff for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Rebasing on parent revision and adding more test cases.

Dec 14 2017, 5:53 AM
opaparo set the repository for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction to rL LLVM.
Dec 14 2017, 5:49 AM
opaparo created D41233: [InstCombine] Canonizing 'and' before 'shl'.
Dec 14 2017, 5:32 AM
opaparo committed rL320692: Inserting several lit tests to reflect current behaviour.
Inserting several lit tests to reflect current behaviour
Dec 14 2017, 4:00 AM

Dec 5 2017

opaparo added a reviewer for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction: AndreiGrischenko.
Dec 5 2017, 3:46 AM
opaparo added a reviewer for D38037: [InstCombine] Compacting or instructions whose operands are shift instructions: AndreiGrischenko.
Dec 5 2017, 3:46 AM

Nov 28 2017

opaparo added a comment to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

I think inverting the canonicalization of shl+and would make your first test case optimize without this patch

Could you please explain why? I'm not sure I'm seeing it.

Currently, we end up inverting the canonicalization in the x86 backend (because a smaller constant mask can be created in less instruction bytes), so it would be better to "get it right" here in IR in the first place.
I understand the multi-use case better now with your explanation, so I agree that we want this patch to handle those cases too. But I don't think we should ignore the underlying canonicalization choices just because we know we want to catch the larger patterns.

I agree that this alternative canonization could prove to be beneficial and more correct. However, I feel that this discussion is orthogonal to this patch, and if it would indeed be decided to switch to the new form then some of the code of this patch, along with several other pieces of code, may need to change accordingly.

Nov 28 2017, 9:15 AM

Nov 26 2017

opaparo added a comment to D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Why do we canonicalize shift-left before 'and'?

Ie, shouldn't we prefer this:

define i8 @andshl(i8 %x) {
  %and = and i8 %x, 1
  %shl = shl i8 %and, 3
  ret i8 %shl
}

instead of this:

define i8 @andshl(i8 %x) {
  %and = shl i8 %x, 3
  %shl = and i8 %and, 8
  ret i8 %shl
}

...because doing the 'and' before the shift always uses a smaller constant?

Nov 26 2017, 5:19 AM

Nov 20 2017

opaparo added a reviewer for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction: lsaba.
Nov 20 2017, 6:38 AM

Nov 13 2017

opaparo updated the diff for D39840: [MC][X86] Code padding for performance stability - Branch instructions and targets alignment.

Replacing std::make_unique with llvm::make_unique.
Adding "skylake-avx512" to the architectures which the optimization will be enabled for.

Nov 13 2017, 11:48 PM
opaparo added inline comments to D39840: [MC][X86] Code padding for performance stability - Branch instructions and targets alignment.
Nov 13 2017, 8:53 AM
opaparo updated the diff for D39840: [MC][X86] Code padding for performance stability - Branch instructions and targets alignment.

Now using MCInstrDesc via MCInstrInfo instead of checking for specific opcodes.
Checked in the test and rebased to make changes more clear.
Added a negative test.
Fixed some typos.

Nov 13 2017, 8:45 AM
opaparo committed rL318041: Inserting a base test for X86 performance nops.
Inserting a base test for X86 performance nops
Nov 13 2017, 7:04 AM

Nov 12 2017

opaparo updated the diff for D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.

Adding a pattern match for a shift of and ((V&C1)<<C2).
Although and of shift is the canonical form, this new form is also required in some cases. The new test multiuse3 demonstrate such a case.

Nov 12 2017, 11:50 PM

Nov 9 2017

opaparo created D39840: [MC][X86] Code padding for performance stability - Branch instructions and targets alignment.
Nov 9 2017, 6:09 AM

Oct 30 2017

opaparo created D39421: [InstCombine] Extracting common and-mask for shift operands of Or instruction.
Oct 30 2017, 8:24 AM

Oct 25 2017

opaparo added a comment to D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.

I haven't looked much at the reassociation pass, but this seems like something that would be simpler to canonicalize there. Did you look at adding a rule there?

Eg, in the "or_or_shift" test, you have a tree of 'or' ops. If you reassociate those so that similar shifts are paired together, instcombine would manage to fold that using existing patterns (although maybe not optimally yet):

define i32 @or_or_shift(i32 %x) local_unnamed_addr #0  {
  %1 = and i32 %x, 2
  %2 = and i32 %x, 4
  %3 = shl nuw nsw i32 %1, 6
  %4 = lshr exact i32 %1, 1
  %5 = shl nuw nsw i32 %2, 6
  %6 = lshr exact i32 %2, 1
  %7 = or i32 %3, %5    <--- shl with shl
  %8 = or i32 %4, %6    <--- lshr with lshr
  %9 = or i32 %7, %8
  ret i32 %9
}

$ ./opt -instcombine reass.ll -S

define i32 @or_or_shift(i32 %x) local_unnamed_addr {
  %1 = shl i32 %x, 6
  %2 = and i32 %1, 384
  %3 = lshr i32 %x, 1
  %4 = and i32 %3, 3
  %5 = or i32 %2, %4
  ret i32 %5
}
Oct 25 2017, 7:52 AM

Oct 24 2017

opaparo added inline comments to D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.
Oct 24 2017, 5:10 AM
opaparo updated the diff for D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.
  1. Removed the lambda MatchShiftOfAnd as Shift of And is canonicalized to And of Shift, so the lambda MatchAndOfShift is sufficient.
  2. The lambda MatchAndOfShift is now not static.
  3. Removed unnecessary calls to replaceAllUsesWith.
  4. Fixed test documentation.
Oct 24 2017, 5:07 AM

Oct 23 2017

opaparo committed rL316413: [MC] Adding code padding for performance stability - infrastructure. NFC..
[MC] Adding code padding for performance stability - infrastructure. NFC.
Oct 23 2017, 11:16 PM
opaparo closed D34393: [MC] Adding code padding for performance stability - infrastructure. NFC. by committing rL316413: [MC] Adding code padding for performance stability - infrastructure. NFC..
Oct 23 2017, 11:16 PM

Oct 16 2017

opaparo added inline comments to D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..
Oct 16 2017, 1:58 AM
opaparo updated the diff for D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..
  1. MCAsmBackend c'tor now takes a unique_ptr of MCCodePadder instead of a raw pointer
  2. MCCodePaddingContext has a new field: IsBasicBlockReachableViaBranch
  3. Changed names of a function and several variables
Oct 16 2017, 1:52 AM
opaparo retitled D34393: [MC] Adding code padding for performance stability - infrastructure. NFC. from Adding code padding for performance stability - infrastructure to [MC] Adding code padding for performance stability - infrastructure. NFC..
Oct 16 2017, 1:46 AM

Sep 19 2017

opaparo created D38037: [InstCombine] Compacting or instructions whose operands are shift instructions.
Sep 19 2017, 7:12 AM

Aug 20 2017

opaparo added a comment to D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..

Tried the two patches with our internal benchmark -- the alignment related performance issue is still there. The problem disappears with -mllvm -x86-experimental-pref-loop-alignment=5 is used.

Aug 20 2017, 7:30 AM

Aug 17 2017

opaparo added a comment to D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).

Do you plan on doing something similar for the DSB decode cache issues that occasionally arise?
Specifically this: https://bugs.llvm.org/show_bug.cgi?id=5615

Aug 17 2017, 5:07 AM

Aug 1 2017

opaparo added a comment to D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).

Ping

Aug 1 2017, 12:06 AM
opaparo added a comment to D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..

Ping

Aug 1 2017, 12:06 AM

Jul 12 2017

opaparo updated the diff for D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).

Removed unwanted dependencies introduces in my previous patch: MC layer no longer depends on CodeGen, Core or Target. This was achieved by introducing a new structure named MCCodePaddingContext that is being created by AsmPrinter and passed on to the MC layer (all the way to MCCodePadder). This replaces the direct use of TargetMachine, MachineBasicBlock, MachineFunction and MachineLoopInfo by the MCCodePadder.
Note: This time I used "svn diff -x -U999999" to create the patch, so comparison between this patch and the previous one may prove to be challenging (This had to be done for future work and review).

Jul 12 2017, 6:49 AM
opaparo updated the diff for D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..

Removed unwanted dependencies introduces in my previous patch: MC layer no longer depends on CodeGen, Core or Target. This was achieved by introducing a new structure named MCCodePaddingContext that is being created by AsmPrinter and passed on to the MC layer (all the way to MCCodePadder). This replaces the direct use of TargetMachine, MachineBasicBlock, MachineFunction and MachineLoopInfo by the MCCodePadder.
Note: This time I used "svn diff -x -U999999" to create the patch, so comparison between this patch and the previous one may prove to be challenging (This had to be done for future work and review).

Jul 12 2017, 6:47 AM

Jun 26 2017

opaparo added inline comments to D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..
Jun 26 2017, 6:19 AM
opaparo added inline comments to D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).
Jun 26 2017, 6:17 AM

Jun 21 2017

opaparo added a comment to D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..

Should we expose this to the .s?

That would make it easy to test codegen because you would just need to
check for things like .pad_bb_begin.

It would make it easier to test the layout as you could use a simple .s
file for that.

Cheers,
Rafael

Jun 21 2017, 8:37 AM

Jun 20 2017

opaparo created D34396: Adding code padding for performance stability - first policy (BranchesWithSameTargetAvoidancePolicy).
Jun 20 2017, 4:57 AM
opaparo created D34393: [MC] Adding code padding for performance stability - infrastructure. NFC..
Jun 20 2017, 4:25 AM

May 17 2016

opaparo updated the diff for D19915: Adding back-end support to two bit scanning intrinsics.

Added 32-bit test for the intrinsics

May 17 2016, 5:19 AM

May 10 2016

opaparo added inline comments to D19915: Adding back-end support to two bit scanning intrinsics.
May 10 2016, 7:32 AM
opaparo updated the diff for D19915: Adding back-end support to two bit scanning intrinsics.
May 10 2016, 7:00 AM

May 9 2016

opaparo updated the diff for D19914: Adding front-end support to several intrinsics (bit scanning, conversion and state reading intrinsics).
May 9 2016, 1:07 AM
opaparo updated the diff for D19914: Adding front-end support to several intrinsics (bit scanning, conversion and state reading intrinsics).
May 9 2016, 12:43 AM
opaparo updated the diff for D19915: Adding back-end support to two bit scanning intrinsics.
May 9 2016, 12:41 AM

May 4 2016

opaparo added reviewers for D19915: Adding back-end support to two bit scanning intrinsics: m_zuckerman, AsafBadouh, igorb.
May 4 2016, 6:23 AM