bmakam (Balaram Makam)
User

Projects

User does not belong to any projects.

User Details

User Since
Sep 19 2014, 10:34 AM (157 w, 21 h)

Recent Activity

Yesterday

bmakam committed rL313998: [Falkor] Add falkor CPU to host detection.
[Falkor] Add falkor CPU to host detection
Fri, Sep 22, 10:48 AM

Wed, Sep 20

bmakam added a comment to D37343: [CGP] Merge empty case blocks if no extra moves are added..

I think this needs more testcases to illustrate the different possibilities here.

Wed, Sep 20, 10:11 AM

Thu, Sep 14

bmakam updated the diff for D37866: [LateJumpThreading] Enable LateJumpThreading right before CGP..

Address Krzysztof's comment.

Thu, Sep 14, 2:30 PM
bmakam added a comment to D36404: Disable jump threading into loop headers.

We need to figure out if LSR is doing something reasonable. It's possible that LSR is actually causing the regression, and old jumpthreading is just serving to obscure the loop structure (thus preventing LSR from doing whatever bad thing it's doing).

Thu, Sep 14, 2:09 PM
bmakam created D37866: [LateJumpThreading] Enable LateJumpThreading right before CGP..
Thu, Sep 14, 2:08 PM
bmakam updated the diff for D37343: [CGP] Merge empty case blocks if no extra moves are added..

Address Eli's comment.

Thu, Sep 14, 9:59 AM
bmakam added inline comments to D37816: Experimental late jump threading pass.
Thu, Sep 14, 8:31 AM
bmakam added inline comments to D37816: Experimental late jump threading pass.
Thu, Sep 14, 8:08 AM

Wed, Sep 13

bmakam added a comment to D36404: Disable jump threading into loop headers.

Have you tried it? Changing jump threading to have early/late versions is trivial, the question is when should the late one run, and if that would help in the first place.

I reverted this change and it recovered the regression. I was going to try to re-run the whole jumpthreading pass after latesimplifycfg to see if it would help, but I don't know if it is a good place because LSR that runs after, will not have the loop in canonical form and loopsimplify will again turn it into irreducible loop. Another place I was thinking was to try before CGP.

If it'll work right before CGP, that's probably best. That way targets can run IR-level passes that want to look at loops before that.

I tried right after latesimplifycfg and it almost recovers the regression. However, running it right before CGP made it even worse for performance of spec2017/perlbench. It seems to be due to some kind of bad interaction with LSR, I haven't digged deeper. Would running right before LSR be reasonable?

Wed, Sep 13, 2:31 PM
bmakam added a comment to D37343: [CGP] Merge empty case blocks if no extra moves are added..

kindly ping.

Wed, Sep 13, 11:56 AM
bmakam added a comment to D36404: Disable jump threading into loop headers.

Have you tried it? Changing jump threading to have early/late versions is trivial, the question is when should the late one run, and if that would help in the first place.

Wed, Sep 13, 8:59 AM
bmakam added a comment to D36404: Disable jump threading into loop headers.

On our internal benchmarks it's mostly neutral, but the benchmark that motivated it improved by 5.3%. We have some extra code, though, that is supposed to deal with similar cases, so some of the impact may be reduced. By default it keeps the behavior unchanged, so for all other architectures it should have no effect.

Wed, Sep 13, 8:12 AM

Sat, Sep 9

bmakam abandoned D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

Subsumed by D37343

Sat, Sep 9, 10:07 PM
bmakam updated the diff for D37343: [CGP] Merge empty case blocks if no extra moves are added..
  • Added some more checks to allow merging empty cases when we can estimate extra moves are not added.
  • Added test cases.
  • spec2017/perlbench improves by 4.5%. No other improvements or regressions observed in Spec.
Sat, Sep 9, 10:05 PM

Thu, Aug 31

bmakam added a comment to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

Is the actual value of the PHI operand relevant here? It looks like some of the testcases on r289988 involve constants, which are materialized during isel (and can be substantially more expensive).

I took a stab at this and created D37343. Please take a look.

Thu, Aug 31, 11:04 AM
bmakam created D37343: [CGP] Merge empty case blocks if no extra moves are added..
Thu, Aug 31, 11:04 AM

Wed, Aug 30

bmakam added a comment to D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..

Thanks for the review.

Wed, Aug 30, 8:05 AM
bmakam committed rL312126: Re-land MachineInstr: Reason locally about some memory objects before going to….
Re-land MachineInstr: Reason locally about some memory objects before going to…
Wed, Aug 30, 7:58 AM
bmakam closed D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA. by committing rL312126: Re-land MachineInstr: Reason locally about some memory objects before going to….
Wed, Aug 30, 7:58 AM

Tue, Aug 29

bmakam updated the diff for D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..

Move assert closer to AA->alias call.

Tue, Aug 29, 2:39 PM
bmakam updated the diff for D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..

rebase and fix x86 lit test.

Tue, Aug 29, 10:33 AM

Fri, Aug 25

bmakam added inline comments to D37076: [LICM] Allow sinking when foldable in loop.
Fri, Aug 25, 8:12 AM
bmakam added a comment to D37076: [LICM] Allow sinking when foldable in loop.

ContainFolderableUsersInLoop -> ContainFoldableUsersInLoop?

Fri, Aug 25, 8:10 AM

Thu, Aug 24

bmakam updated the diff for D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..
Thu, Aug 24, 1:54 PM

Aug 21 2017

bmakam updated the diff for D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..
Aug 21 2017, 3:51 PM

Aug 20 2017

bmakam updated the diff for D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..

Trim the patch and avoid handling stack references for non-value MMOs to prevent LNT test failures.

Aug 20 2017, 1:33 PM

Aug 18 2017

bmakam created D36900: Re-land MachineInstr: Reason locally about some memory objects before going to AA..
Aug 18 2017, 2:39 PM

Aug 16 2017

bmakam committed rL311008: Revert "MachineInstr: Reason locally about some memory objects before going to….
Revert "MachineInstr: Reason locally about some memory objects before going to…
Aug 16 2017, 7:18 AM

Aug 14 2017

bmakam added a comment to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

I have looked into the phi operands closely and I am not convinced if phi operands involving constants have any influence on the profitability of merging empty blocks. I identified empty exit blocks if when merged with their successors improved the performance a bit, yet if another similar empty exit block was merged with the destination block, it sinks the performance. The only difference between them is that the successor block is also empty when the performance regressed.

Aug 14 2017, 1:22 PM
bmakam closed D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Thanks for taking a look. Rebased and committed in r310825.

Aug 14 2017, 2:46 AM
bmakam committed rL310825: MachineInstr: Reason locally about some memory objects before going to AA..
MachineInstr: Reason locally about some memory objects before going to AA.
Aug 14 2017, 2:42 AM

Aug 8 2017

bmakam added a comment to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

To clarify, the original version of this patch recovered the full 3% regression, but the new version only recovers 0.7%?

Correct.

Is the actual value of the PHI operand relevant here? It looks like some of the testcases on r289988 involve constants, which are materialized during isel (and can be substantially more expensive).

Thanks Eli, this looks interesting. Another observation is that if we increase cgp-freq-ratio-to-skip-merge to 1000 it will recover the full 3% regression. I am trying to reduce the exact basic block which when skipped merging in CGP removes placing copies in hot path and improves performance due to reduced dynamic instruction count and the basic block(s) which when merged eliminate the unnecessary branches and improve the performance due to better branching/i-cache utilization. This might provide an answer to your question about the relevance of PHI operands.

Aug 8 2017, 10:00 AM

Aug 7 2017

bmakam added a comment to D36404: Disable jump threading into loop headers.

@bmakam: Isn't this very similar to some of your recent work?

Aug 7 2017, 11:19 AM

Aug 2 2017

bmakam updated the diff for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

Update to address Eli's comments.

Aug 2 2017, 1:02 PM
bmakam added a comment to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

A gentle ping.

Aug 2 2017, 11:49 AM

Jul 31 2017

bmakam added inline comments to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 31 2017, 8:37 AM

Jul 28 2017

bmakam updated the diff for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

Updated patch to address the underlying issue.

Jul 28 2017, 1:48 PM

Jul 20 2017

bmakam planned changes to D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..

The underlying issue is that after loopsimplify creates empty exit blocks, CGP cannot clean up the empty blocks if it happens to be coming from a switch case. I am working on an alternative approach to address this issue.

Jul 20 2017, 10:50 AM

Jul 19 2017

bmakam committed rL308422: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can….
[SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can…
Jul 19 2017, 1:55 AM
bmakam closed D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure. by committing rL308422: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can….
Jul 19 2017, 1:55 AM
bmakam removed a dependent revision for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify.: D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 19 2017, 1:50 AM
bmakam removed a dependency for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 19 2017, 1:50 AM

Jul 18 2017

bmakam added a dependency for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 18 2017, 3:18 PM
bmakam added a dependent revision for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify.: D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 18 2017, 3:18 PM
bmakam removed a dependent revision for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 18 2017, 3:18 PM
bmakam removed a dependency for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify.: D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 18 2017, 3:18 PM
bmakam added a dependent revision for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure.: D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 18 2017, 3:14 PM
bmakam added a dependency for D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify.: D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 18 2017, 3:14 PM
bmakam created D35584: [CGP] Fold empty dedicated exit blocks created by loopsimplify..
Jul 18 2017, 3:14 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

The CGP changes are here: D35584

Jul 18 2017, 3:14 PM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Split CGP changes into a separate follow on patch, per Eli's request.

Jul 18 2017, 2:26 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

I mean the CodeGenPrepare changes (which are deleting blocks generated by LSR) vs the other changes (which modify transforms that run before LSR).

Jul 18 2017, 2:06 PM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Minor cleanup. Thanks, Eli.

Jul 18 2017, 2:00 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Hmm...

I took a quick look at the pass pipeline (PassManagerBuilder::populateModulePassManager), and it turns out LateSimplifyCFG is false for the last simplifycfg run. That might be the source of your problem?

Jul 18 2017, 1:23 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

SwitchToLookupTable is itself part of LateSimplifyCFG; we should fold empty blocks afterwards, I think?

Jul 18 2017, 12:27 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

"After switch is lowered"? What transform are you talking about?

Jul 18 2017, 11:58 AM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Why do we need the empty block folding in both LateSimplifyCFG and CGP?

Empty folding in CGP occurs very late after LSR and is also not aggressive because it is not run iteratively. LateSimplifyCFG cannot catch empty case blocks after switch is lowered, so we need the empty block folding in both places.

Jul 18 2017, 11:38 AM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Update to address Eli's comments.

Jul 18 2017, 11:35 AM

Jul 17 2017

bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Minor code clean up, NFCI.

Jul 17 2017, 1:42 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Thanks Balaram for posting this patch, in general idea looks good to preserve the canonical form of the loops.

Any idea why these benchmark regressed:

spec2006/bzip2:ref -2.18
spec2017/omnetpp:ref -1.2
spec2006/perlbench:ref -1.07

Jul 17 2017, 8:38 AM

Jul 14 2017

bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

I'm sort of worried this could have unexpected consequences; do you have performance numbers? (LLVM testsuite or SPEC)

I was targeting to unroll a hot loop in spec2017/gcc. In addition to unrolling the hot loop in spec2017/gcc which yielded 2% improvement, I observed a loop interleaved in povray which yielded 6-7% improvement.
Here are the full perf results for SPEC on Falkor with O3 config:

Jul 14 2017, 3:06 PM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Update latch set and modify a test case. Thanks, Eli.

Jul 14 2017, 2:54 PM
bmakam added a comment to D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 14 2017, 9:29 AM
bmakam updated the diff for D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..

Let me see if I can describe the problem and your approach to fixing the issue in my own words.

Currently, JumpThreading and SimplifyCFG avoid threading/merging "empty" loop headers as this would break the canonical form of the loop; the CFG edge being optimized is between the loop header and its successor. Your approach is to also avoid merging the incoming edges (i.e., back edges) to the loop header as well to avoid breaking the canonical form of the loop. Then later in late-SimplifyCFG and CodeGen prepare you more aggressively remove these empty blocks.

Sound about right?

Thanks Chad,
That's right.

Jul 14 2017, 9:28 AM
bmakam created D35411: [SimplifyCFG] Defer folding unconditional branches to LateSimplifyCFG if it can destroy canonical loop structure..
Jul 14 2017, 5:13 AM

Jun 16 2017

bmakam updated the diff for D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Uploading the correct patch. I was being conservative by checking for isConstant only, I can add more cases if you think they are more interesting.

Jun 16 2017, 3:14 PM
bmakam added a comment to D34181: MachineInstr: Reason locally about some memory objects before going to AA..

I see the problem now, a part of the patch that addressed Geoff's comments earlier was missing. I will update the correct patch for review shortly.

Jun 16 2017, 2:35 PM
bmakam added a comment to D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Any idea what effect this has on performance, if any?

This patch is performance neutral on Falkor/Kryo. This addresses a FIXME that I saw in the passing and attempted to fix as it saves compile time and catch corner cases that are currently not covered.

The changes to the AMDGPU tests need more explanation.

These lit tests were failing due to instruction scheduling changes, so I modified the tests to reflect the new code generation and made the AMDGPU tests more robust.

Jun 16 2017, 1:51 PM
bmakam updated the diff for D34181: MachineInstr: Reason locally about some memory objects before going to AA..

Address Geoff's comments.

Jun 16 2017, 8:51 AM

Jun 14 2017

bmakam added inline comments to D34181: MachineInstr: Reason locally about some memory objects before going to AA..
Jun 14 2017, 3:33 PM

Jun 13 2017

bmakam created D34181: MachineInstr: Reason locally about some memory objects before going to AA..
Jun 13 2017, 3:44 PM

Apr 11 2017

bmakam committed rL299994: [AArch64] Fix scheduling info for INS(vector, general) instruction..
[AArch64] Fix scheduling info for INS(vector, general) instruction.
Apr 11 2017, 3:26 PM

Apr 7 2017

bmakam committed rL299810: [AArch64] Refine Falkor Machine Model - Part 3.
[AArch64] Refine Falkor Machine Model - Part 3
Apr 7 2017, 8:42 PM

Apr 4 2017

bmakam committed rL299468: [AArch64] Add missing schedinfo, check completeness for Falkor..
[AArch64] Add missing schedinfo, check completeness for Falkor.
Apr 4 2017, 2:28 PM
bmakam committed rL299456: [AArch64] Refine Falkor Machine Model - Part 2.
[AArch64] Refine Falkor Machine Model - Part 2
Apr 4 2017, 11:54 AM

Mar 31 2017

bmakam committed rL299240: [AArch64] Add new subtarget feature to fold LSL into address mode..
[AArch64] Add new subtarget feature to fold LSL into address mode.
Mar 31 2017, 11:29 AM
bmakam closed D31113: [AArch64] Add new subtarget feature to fold LSL into address mode. by committing rL299240: [AArch64] Add new subtarget feature to fold LSL into address mode..
Mar 31 2017, 11:29 AM
bmakam added a comment to D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Thanks for the review Chad,
If there is no objection from others, I will commit this change with the fixes for your comments.

Mar 31 2017, 9:23 AM

Mar 24 2017

bmakam committed rL298768: [AArch64] Refine Falkor Machine Model - Part1.
[AArch64] Refine Falkor Machine Model - Part1
Mar 24 2017, 9:15 PM

Mar 23 2017

bmakam updated the diff for D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Address Chad's comments.

Mar 23 2017, 1:31 PM

Mar 21 2017

bmakam updated the diff for D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..

Address Jun's comments.

Mar 21 2017, 1:56 PM

Mar 17 2017

bmakam created D31113: [AArch64] Add new subtarget feature to fold LSL into address mode..
Mar 17 2017, 8:46 PM

Mar 13 2017

bmakam committed rL297611: [AArch64] Map Sched Read/Write resources for Falkor..
[AArch64] Map Sched Read/Write resources for Falkor.
Mar 13 2017, 3:54 AM

Mar 7 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

Regarding other regressions in spec2006.
New method does not guarantee perfect solution. So I think it would be fair to apply it if it generally demonstrate better code.
By generally I mean, that summ of all LSR solution registers (say in a benchmark) become lower.
I can collect such statistic for you arch (please proved me with exact options).
If new method generally select more registers for LSR solutions I'll need to fix this.

Mar 7 2017, 8:59 AM

Mar 2 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

Thanks for looking into the regression. I tested D30552 on our AArch64 Kryo target and for spec2006/hmmer it recovered some of the lost performance, however it is still 2% regressed compared to 9% regression previously with lsr-exp-narrow flag on by default.

Mar 2 2017, 10:04 PM

Feb 28 2017

bmakam added a comment to D29862: LSR: an alternative way to resolve complex solution.

FWIW, we are seeing a 9% regression in spec2006/hmmer on our AArch64 Kryo target with this flag turned on by default.

Feb 28 2017, 9:51 AM

Jan 26 2017

bmakam committed rL293204: [AArch64] Refine Kryo Machine Model.
[AArch64] Refine Kryo Machine Model
Jan 26 2017, 12:21 PM
bmakam closed D29191: [AArch64] Refine Kryo Machine Model by committing rL293204: [AArch64] Refine Kryo Machine Model.
Jan 26 2017, 12:21 PM
bmakam abandoned D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Jan 26 2017, 11:04 AM
bmakam created D29191: [AArch64] Refine Kryo Machine Model.
Jan 26 2017, 11:03 AM

Nov 23 2016

bmakam planned changes to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Nov 23 2016, 7:50 AM

Oct 12 2016

bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Rebase and ping.

Oct 12 2016, 11:02 AM

Oct 4 2016

bmakam retitled D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses from [LoopDataPrefetch/AArch64] Allow selective prefetching of symbolic strided accesses to [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Oct 4 2016, 10:38 PM
bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Restricted to only irregular symbolic strides such as those found in spec2006/mcf and spec2000/gap. Please take a look.

Oct 4 2016, 10:31 PM

Sep 28 2016

bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 11:39 PM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 11:06 PM
bmakam added inline comments to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 28 2016, 10:21 PM

Sep 27 2016

bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Do you have any additional comments on this change?

Sep 27 2016, 10:26 AM

Sep 26 2016

bmakam added a comment to D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.

Hi Balaram,

This seems like a well made patch. Correctly enabling the feature, using the pre-fetch when it's profitable and with good tests.

I'll leave the remaining of the reviews and approval to Adam et al, but from my side, the change looks good.

Sep 26 2016, 11:04 AM
bmakam updated the diff for D24833: [LoopDataPrefetch/AArch64] Allow selective prefetching of irregular symbolic strided accesses.
Sep 26 2016, 3:23 AM