Page MenuHomePhabricator
Feed Advanced Search

Jul 18 2017

bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Minor cleanup. Thanks, Eli.

Jul 18 2017, 2:00 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Hmm...

I took a quick look at the pass pipeline (PassManagerBuilder::populateModulePassManager), and it turns out LateSimplifyCFG is false for the last simplifycfg run. That might be the source of your problem?

Jul 18 2017, 1:23 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

SwitchToLookupTable is itself part of LateSimplifyCFG; we should fold empty blocks afterwards, I think?

Jul 18 2017, 12:27 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

"After switch is lowered"? What transform are you talking about?

Jul 18 2017, 11:58 AM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Why do we need the empty block folding in both LateSimplifyCFG and CGP?

Empty folding in CGP occurs very late after LSR and is also not aggressive because it is not run iteratively. LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

Jul 18 2017, 11:38 AM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Update to address Eli's comments.

Jul 18 2017, 11:35 AM

Jul 17 2017

bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Minor code clean up, NFCI.

Jul 17 2017, 1:42 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Thanks Balaram for posting this patch, in general idea looks good to preserve the canonical form of the loops.

Any idea why these benchmark regressed:

spec2006/bzip2:ref -2.18
spec2017/omnetpp:ref -1.2
spec2006/perlbench:ref -1.07

Jul 17 2017, 8:38 AM

Jul 14 2017

bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

I'm sort of worried this could have unexpected consequences; do you have performance numbers? (LLVM testsuite or SPEC)

I was targeting to unroll a hot loop in spec2017/gcc. In addition to unrolling the hot loop in spec2017/gcc which yielded 2% improvement, I observed a loop interleaved in povray which yielded 6-7% improvement.
Here are the full perf results for SPEC on Falkor with O3 config:

Jul 14 2017, 3:06 PM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Update latch set and modify a test case. Thanks, Eli.

Jul 14 2017, 2:54 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 14 2017, 9:29 AM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Let me see if I can describe the problem and your approach to fixing the issue in my own words.

Currently, JumpThreading and SimplifyCFG avoid threading/merging "empty" loop headers as this would break the canonical form of the loop; the CFG edge being optimized is between the loop header and its successor. Your approach is to also avoid merging the incoming edges (i.e., back edges) to the loop header as well to avoid breaking the canonical form of the loop. Then later in late-SimplifyCFG and CodeGen prepare you more aggressively remove these empty blocks.

Sound about right?

Thanks Chad,
That's right.

Jul 14 2017, 9:28 AM
bmakam created D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 14 2017, 5:13 AM

Jun 16 2017

bmakam updated the diff for D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Uploading the correct patch. I was being conservative by checking for isConstant only, I can add more cases if you think they are more interesting.

Jun 16 2017, 3:14 PM
bmakam added a comment to D34181: MachineInstr: Reason locally about some memory objects before going to AA..

I see the problem now, a part of the patch that addressed Geoff's comments earlier was missing. I will update the correct patch for review shortly.

Jun 16 2017, 2:35 PM
bmakam added a comment to D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Any idea what effect this has on performance, if any?

This patch is performance neutral on Falkor/Kryo. This addresses a FIXME that I saw in the passing and attempted to fix as it saves compile time and catch corner cases that are currently not covered.

The changes to the AMDGPU tests need more explanation.

These lit tests were failing due to instruction scheduling changes, so I modified the tests to reflect the new code generation and made the AMDGPU tests more robust.

Jun 16 2017, 1:51 PM
bmakam updated the diff for D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Address Geoff's comments.

Jun 16 2017, 8:51 AM

Jun 14 2017

bmakam added inline comments to D34181: MachineInstr: Reason locally about some memory objects before going to AA..
Jun 14 2017, 3:33 PM

Jun 13 2017

bmakam created D34181: MachineInstr: Reason locally about some memory objects before going to AA..
Jun 13 2017, 3:44 PM

Apr 11 2017

bmakam committed rL299994: [AArch64] Fix scheduling info for INS(vector, general) instruction..
[AArch64] Fix scheduling info for INS(vector, general) instruction.
Apr 11 2017, 3:26 PM

Apr 7 2017

bmakam committed rL299810: [AArch64] Refine Falkor Machine Model - Part 3.
[AArch64] Refine Falkor Machine Model - Part 3
Apr 7 2017, 8:42 PM

Apr 4 2017

bmakam committed rL299468: [AArch64] Add missing schedinfo, check completeness for Falkor..
[AArch64] Add missing schedinfo, check completeness for Falkor.
Apr 4 2017, 2:28 PM
bmakam committed rL299456: [AArch64] Refine Falkor Machine Model - Part 2.
[AArch64] Refine Falkor Machine Model - Part 2
Apr 4 2017, 11:54 AM

Mar 31 2017

bmakam committed rL299240: [AArch64] Add new subtarget feature to fold LSL into address mode..
[AArch64] Add new subtarget feature to fold LSL into address mode.
Mar 31 2017, 11:29 AM
bmakam closed D31113: [AArch64] Add new subtarget feature to fold LSL into address mode. by committing rL299240: [AArch64] Add new subtarget feature to fold LSL into address mode..
Mar 31 2017, 11:29 AM
bmakam added a comment to D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Thanks for the review Chad,
If there is no objection from others, I will commit this change with the fixes for your comments.

Mar 31 2017, 9:23 AM

Mar 24 2017

bmakam committed rL298768: [AArch64] Refine Falkor Machine Model - Part1.
[AArch64] Refine Falkor Machine Model - Part1
Mar 24 2017, 9:15 PM

Mar 23 2017

bmakam updated the diff for D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Address Chad's comments.

Mar 23 2017, 1:31 PM

Mar 21 2017

bmakam updated the diff for D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Address Jun's comments.

Mar 21 2017, 1:56 PM

Mar 17 2017

bmakam created D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..
Mar 17 2017, 8:46 PM

Mar 13 2017

bmakam committed rL297611: [AArch64] Map Sched Read/Write resources for Falkor..
[AArch64] Map Sched Read/Write resources for Falkor.
Mar 13 2017, 3:54 AM

Mar 7 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

Regarding other regressions in spec2006.
New method does not guarantee perfect solution. So I think it would be fair to apply it if it generally demonstrate better code.
By generally I mean, that summ of all LSR solution registers (say in a benchmark) become lower.
I can collect such statistic for you arch (please proved me with exact options).
If new method generally select more registers for LSR solutions I'll need to fix this.

Mar 7 2017, 8:59 AM

Mar 2 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

Thanks for looking into the regression. I tested D30552 on our AArch64 Kryo target and for spec2006/hmmer it recovered some of the lost performance, however it is still 2% regressed compared to 9% regression previously with lsr-exp-narrow flag on by default.

Mar 2 2017, 10:04 PM

Feb 28 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

FWIW, we are seeing a 9% regression in spec2006/hmmer on our AArch64 Kryo target with this flag turned on by default.

Feb 28 2017, 9:51 AM

Jan 26 2017

bmakam committed rL293204: [AArch64] Refine Kryo Machine Model.
[AArch64] Refine Kryo Machine Model
Jan 26 2017, 12:21 PM
bmakam closed D29191: [AArch64] Refine Kryo Machine Model by committing rL293204: [AArch64] Refine Kryo Machine Model.
Jan 26 2017, 12:21 PM
bmakam abandoned D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Jan 26 2017, 11:04 AM
bmakam created D29191: [AArch64] Refine Kryo Machine Model.
Jan 26 2017, 11:03 AM

Nov 23 2016

bmakam planned changes to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Nov 23 2016, 7:50 AM

Oct 12 2016

bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Rebase and ping.

Oct 12 2016, 11:02 AM

Oct 4 2016

bmakam retitled D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses from [LoopDataPrefetch/AArch64] Allow selective prefetching of symbolic strided accesses to [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Oct 4 2016, 10:38 PM
bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Restricted to only irregular symbolic strides such as those found in spec2006/mcf and spec2000/gap. Please take a look.

Oct 4 2016, 10:31 PM

Sep 28 2016

bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 11:39 PM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 11:06 PM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 10:21 PM

Sep 27 2016

bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Do you have any additional comments on this change?

Sep 27 2016, 10:26 AM

Sep 26 2016

bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Hi Balaram,

This seems like a well made patch. Correctly enabling the feature, using the pre-fetch when it's profitable and with good tests.

I'll leave the remaining of the reviews and approval to Adam et al, but from my side, the change looks good.

Sep 26 2016, 11:04 AM
bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 26 2016, 3:23 AM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 26 2016, 3:20 AM
bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Added more testscases.

Sep 26 2016, 3:15 AM
bmakam updated D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 26 2016, 3:12 AM

Sep 23 2016

bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Irregular streams typically consist of array accesses in which a subscripted variable
appears in one of the subscript positions, such as: A[B[i]].

For example:

for (unsigned i = 0; i < 100; i++)
  A[i + 1] = A[Stride * i] + B[i];

There is something confusing here. Is Stride loop-variant here? Otherwise I don't see how this is a A[B[i]]-style access.

Sorry for the confusion. What I meant was address of B[i] can be represented as i*sizeof(B), so sizeof(B) is the loop-invariant stride here.

Sorry but I still don't understand this. Can you please elaborate on where the A[B[i]] style access is in this loop?

Sep 23 2016, 11:52 AM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 23 2016, 11:18 AM

Sep 22 2016

bmakam updated D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 22 2016, 11:20 AM
bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Irregular streams typically consist of array accesses in which a subscripted variable
appears in one of the subscript positions, such as: A[B[i]].

For example:

for (unsigned i = 0; i < 100; i++)
  A[i + 1] = A[Stride * i] + B[i];

There is something confusing here. Is Stride loop-variant here? Otherwise I don't see how this is a A[B[i]]-style access.

Sep 22 2016, 10:04 AM
bmakam retitled D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses from to [LoopDataPrefetch/AArch64] Allow selective prefetching of symbolic strided accesses.
Sep 22 2016, 9:50 AM

Sep 8 2016

bmakam committed rL280966: [LoopDataPrefetch] Use range based for loop; NFCI.
[LoopDataPrefetch] Use range based for loop; NFCI
Sep 8 2016, 10:16 AM

Sep 1 2016

bmakam abandoned D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..

I am abandoning this patch because of the additional complexity involved for very minimal performance gain. Please feel free to revisit if you think otherwise.

Sep 1 2016, 8:07 AM

Aug 30 2016

bmakam added inline comments to D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..
Aug 30 2016, 11:23 AM
bmakam added inline comments to D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..
Aug 30 2016, 9:13 AM
bmakam added inline comments to D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..
Aug 30 2016, 8:38 AM
bmakam added a comment to D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..

The logic looks fine, but I'm wondering if it makes sense to add this functionality directly into SCEV? It might be beneficial to transformations other than SLP and LoopIdiom. What do you think?

Aug 30 2016, 8:36 AM
bmakam updated the diff for D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..

Merge the two versions of isConsecutiveAccess into LoopAccessAnalysis and remove the implementation in LoadStoreVectorizer.

Aug 30 2016, 8:34 AM

Aug 25 2016

bmakam added a comment to D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..

Balaram,

Here's a quick question before I start looking at the details. Won't we still have two implementations of isConsecutiveAccess after this change? Is something preventing us from replacing LoadStoreVectorizer::isConsecutiveAccess with llvm::isConsecutiveAccess?

Aug 25 2016, 8:24 AM

Aug 24 2016

bmakam updated D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..
Aug 24 2016, 3:32 PM
bmakam retitled D23854: [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses. from to [LoopAccessAnalysis] Recognize geps that include s/zexts as consecutive memory accesses..
Aug 24 2016, 3:19 PM

Aug 22 2016

bmakam committed rL279431: [PM] Port LoopDataPrefetch AArch64 tests to new pass manager.
[PM] Port LoopDataPrefetch AArch64 tests to new pass manager
Aug 22 2016, 6:08 AM
bmakam closed D23724: [PM] Port LoopDataPrefetch AArch64 tests to new pass manager by committing rL279431: [PM] Port LoopDataPrefetch AArch64 tests to new pass manager.
Aug 22 2016, 6:08 AM

Aug 19 2016

bmakam retitled D23724: [PM] Port LoopDataPrefetch AArch64 tests to new pass manager from to [PM] Port LoopDataPrefetch AArch64 tests to new pass manager.
Aug 19 2016, 12:21 PM
bmakam abandoned D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.
Aug 19 2016, 4:09 AM

Aug 18 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

This hasn't gone in yet, just FYI.

Aug 18 2016, 3:32 PM
bmakam abandoned D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

I agree to all your points. FWIW, the idea of using value range for static branch prediction is not new. The idea was first introduced by Jason Patterson in his SIGPLAN'95 paper: "Accurate Static Branch Prediction by Value Range Propagation". So I think there is definitely some value in this. However, I have dropped this from my plate because I have already spent a lot of time trying to improve this past several months, so I would rather spend my efforts elsewhere. Thanks for the review.

Aug 18 2016, 3:19 PM

Aug 11 2016

bmakam added a comment to D23052: [Inliner] Add a flag to disable manual alloca merging in the Inliner..

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train[-2.347%, +3.157%]
spec2006/sphinx3:train[+0.469%, +2.124%]
spec2006/xalancbmk:train[+6.566%, +8.743%]

There were no regressions.

Thanks, Balaram. No correctness issues, correct?

Oh I forgot running full correctness. There were no correctness issues in SPEC200X benchmarks. I will run full correctness tests.

Aug 11 2016, 8:38 AM

Aug 9 2016

bmakam added a comment to D23052: [Inliner] Add a flag to disable manual alloca merging in the Inliner..

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Tested on Kryo. Only these SPEC200X benchmarks had non-noise performance gains(runtime) with this change:

spec2006/soplex:train[-2.347%, +3.157%]
spec2006/sphinx3:train[+0.469%, +2.124%]
spec2006/xalancbmk:train[+6.566%, +8.743%]

There were no regressions.

Thanks, Balaram. No correctness issues, correct?

Aug 9 2016, 8:28 AM
bmakam added a comment to D23052: [Inliner] Add a flag to disable manual alloca merging in the Inliner..

@bmakam: Would you mind downloading and testing this patch? Please do full correctness and SPEC200X performance (our head-to-head methodology with train input would be fine).

Aug 9 2016, 8:12 AM

Jul 13 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

Thanks for testing this patch out, Tim.

Jul 13 2016, 7:30 AM

Jul 12 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

FWIW, this heuristic is not assigning any branch probabilities based on the range size, it only ranks the commutative binary operands based on the generic assumption that if we have two conditions a != 0 and b == 2, it is more likely that a != 0 than b == 2

And I still think that's not obviously true. Integers actually used often take a very limited number of values, and this seems like a common idiom to me:

int res = some_func();
if (res < 0)
  llvm_unreachable("WTF happened");
else if (res == 0)
  [...]

This is exactly the idiom I am trying to clarify. This change does not influence the branch direction for the idiom like above at all. All this change targets is code like

Jul 12 2016, 3:22 PM
bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

Yuck, I hate heuristics like this. It's not even particularly clear that "range size" correlates well with probability in real code, let alone with what any given branch predictor thinks of that probability.

With all due respect, I think there are some facts that need some clarification. First, for targets which have cheap jump instructions, currently LLVM splits a conditional branch like:

Jul 12 2016, 1:56 PM

Jul 11 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

If all cores that have FeaturePredictableSelectIsExpensive can also have the new flag, and it makes sense, it could be coalesced into a single flag?

FWIW, PredictableSelectIsExpensive is also set in X86ISelLowering.cpp: PredictableSelectIsExpensive = Subtarget.getSchedModel().isOutOfOrder()
I created a new flag because I could not verify the profitability of this patch on x86 target. I agree if it makes sense to enable it for x86, we could coalesce into a single flag.

Jul 11 2016, 3:12 PM
bmakam added a reviewer for D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition.: sebpop.
Jul 11 2016, 10:46 AM
bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

It seems like there is no objection to turn this on for A57 but there is no official approval yet. If it helps to pursuade the community, I have completed running tests on A53 and this change is performance neutral on A53 with no noise regressions or gains. IMHO this seems to be good for AArch64 targets, but I am inclined to leave it enabled only for Kryo because there is no approval from the community. Any thoughts?

Jul 11 2016, 5:09 AM

Jul 7 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

This hasn't gone in yet, just FYI.

Jul 7 2016, 8:46 AM
bmakam updated the diff for D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

rebased.

Jul 7 2016, 8:42 AM

Jul 5 2016

bmakam committed rL274573: Revert r259387: "AArch64: Implement missed conditional compare sequences.".
Revert r259387: "AArch64: Implement missed conditional compare sequences."
Jul 5 2016, 1:31 PM
bmakam abandoned D16978: [InstCombine] Try harder to simplify ~(X & Y) -> ~X | ~Y and ~(X | Y) -> ~X & ~Y when X and Y have more than one uses..

Performance data was not promising, so abandoning this change.

Jul 5 2016, 11:02 AM

Jun 21 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

The only clear performance differences on A57 are:

Jun 21 2016, 3:21 AM

Jun 20 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

In principle, this change is target independent because it reassociates binary operands to simplify branches. The reassociation pass is designed for transformations that will help down the line optimizations such as constant propagation, GCSE, LICM, PRE etc.. so I moved it down to CGP.

The re-association is target independent, but guessing which branch will be taken probably isn't, as it depends on the branch-predictor, which are wildly different on some targets / workloads.

I can certainly verify for A57 and know for a fact that it improves spec2006/mcf on A57 as well. However, I am uncertain of reliably testing and verifying on other targets.

At least for A57 would be nice.

The new flag should suffice. It can also allow other target maintainers to test on their arches by adding the feature and benchmarking, and then commit the change if profitable. Only then, if this is universally true, we could remove the flag and make it a generic pass.

Thanks Renato,
Performance runs on A57 are ongoing, and I will update the results once I get them.

Jun 20 2016, 10:20 AM

Jun 17 2016

bmakam updated the diff for D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

Although this is target independent, I have added a feature flag to guard this change. It is currently enabled only for Kryo because I tested only on this target. If this is profitable for other targets, we can add the feature flag to those targets.

Jun 17 2016, 3:13 PM
bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

In principle, this change is target independent because it reassociates binary operands to simplify branches. The reassociation pass is designed for transformations that will help down the line optimizations such as constant propagation, GCSE, LICM, PRE etc.. so I moved it down to CGP.
I can certainly verify for A57 and know for a fact that it improves spec2006/mcf on A57 as well. However, I am uncertain of reliably testing and verifying on other targets.

Jun 17 2016, 8:24 AM

Jun 16 2016

bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

Thanks Renato,

Jun 16 2016, 8:24 AM
bmakam updated the diff for D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

cleanup code. NFCI.

Jun 16 2016, 2:20 AM

Jun 15 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

Thanks for the review Renato.

Jun 15 2016, 12:48 PM
bmakam added a comment to D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

Gentle ping. Fixes PR21600

Jun 15 2016, 12:46 PM

Jun 13 2016

bmakam updated the diff for D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition..

update lit tests.

Jun 13 2016, 2:16 PM
bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

Thanks for testing Kristof. I have pushed D21299 that will fix the regressions and will help enabling this change by default for all the subtargets. Once D21299 lands, I expect to see 8% improvement in spec2006/mcf with this patch.

Jun 13 2016, 11:16 AM
bmakam retitled D21299: [Codegen Prepare] Swap commutative binops before splitting branch condition. from to [Codegen Prepare] Swap commutative binops before splitting branch condition..
Jun 13 2016, 9:43 AM

May 24 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

Hi Renato,

May 24 2016, 10:27 AM

May 16 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

Hi Balaram,

This seems like a good thing to do overall, not just for Kryo, or when the option is chosen.

I agree and would advocate enabling this by default after some additional testing.

It would be good to know how it performs in vanilla AArch64 cores (A53, A57) so we could enable them by default.

We should be able to get numbers for at least A57, right?

May 16 2016, 7:31 AM

May 13 2016

bmakam added a comment to D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block.

gentle ping.

May 13 2016, 11:06 AM
bmakam removed a reviewer for D20030: [AArch64] Add option to disable speculation of triangle whose tail is the only latch block: llvm-commits.
May 13 2016, 11:06 AM